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UNIFIED. SCALABLE. FLEXIBLE. 


Across all industries the demands of data infrastructure have soared to new heights. 


As capacity requirements continue to rise at an ever-increasing rate, performance must not be compromised. The hybrid 
architecture and advanced software capabilities of the TrueNAS appliance enable users to be more agile, effectively 
manage the explosion of unstructured data and deploy a centralized information storage infrastructure. Whether it’s 
backing virtual machines, business applications, or web services, there’s a TrueNAS appliance suited to the task. 


TrueNAS™ Storage Appliances: Harness The Cloud 


iXsystems’ TrueNAS Appliances offer scalable high-throughput, low latency storage 


All TrueNAS Storage Appliances feature the Intel® Xeon® Processors 5600 series, powering the fastest data transfer 
speeds and lowest latency possible. TrueNAS appliances come in three lines: Performance, Archiver, & High Availability. 
High-performance, high-capacity ioMemory modules from Fusion-io are available in the TrueNAS Enterprise, Ultimate, 
and Archiver Pro models. 


Key Features: 
¢ One or Two Six-Core Intel® Xeon® Processors 
5600 series 
¢ Share Data over CIFS, NFS and iSCSI 


* Hybrid storage pool increases performance and 
decreases energy footprint 

* 128-bit ZFS file system with up to triple parity 
software RAID 


*Optional component 
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Dear Readers, 
Here comes another issue of BSD Magazine, what couldn't be 
possible without your support and interest in the subject. 

| think that all who read it, believes as | do, that the future 
belongs to Open Source Software. Every year the OS makes the 
step forward and although there will always exist big companies 
with their software, | think that it’s a matter of time when the 
powers will shift. More and more not only common users, 
but also companies are in favor of OS. And what is the most 
valuable — there are more young and skilled developers who 
choose the OS path. I’m sure that not only among our authors, 
but also among our readers there lots of them. 

! would like to encourage ti 0 are still uncertain or 
humbled by successes of ott ers, to-try themselves in this task. 
Fresh ideas and passion’ can overcome the experience, and 
there is no better Way to gain the experience than to follow the 
ideas, which come out of passion. So, don’t wait longer, just start 
to act. I'm sure that many of you will be surprised with the effect. 

We may choose only the next step we will take. The rest 
just don’t belong to us anymore, but to the past. And with this 
thought | leave you now, so you could enjoy the August iSsue of 
BSD Magazine. 


Wish you a goodsread! 
Patrycja Rrzybylowicz 
& BSD Team 
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What’s New 


OG MaheshaBSD Server Edition Has Been 
Just Released! 
By Juraj Sipos 
Many newcomers to FreeBSD find it difficult to setup their 
own FTP/WWW server quickly and, on the other hand, 
experienced users sometimes need to take precautions 
for unexpected crash situations — that is, to have a strat- 
egy for time economization and portability, as these two 
are valuable assets in our rushing world. From this article 
you will find out ow to run a simple and smart FTP/WWW 
server. 


TM 


How To 


410 Tuning ZFS on FreeBSD 
By Martin Matuska 

ZFS is a modern 128-bit file system based on the copy- 
on-write model. It originates from the OpenSolaris proj- 
ect and has first appeared in FreeBSD in 2008. ZFS has 
many innovative features including an integrated vol- 
ume manager with mirroring and RAID capabilities, data 
checksumming and compression, writable snapshots that 
can be transferred between systems and many more. In 
this article the author is going to discuss several tuning 
options including sysctl(2) knobs and give examples how 
can ZFS performance and efficiency can be measured 
and evaluated. This article is intended for FreeBSD users 
with ZFS version 28 available since 8.3-RELEASE and 
9.0-RELEASE. 
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G MPD5 - VPN Server with FreeBSD Setup 
and Management 
By Antonio Francesco Gentile 
Mpd5 is a fast, flexible and secure way to make VPN con- 
nections on FreeBSD. It requires very few resources and 
supports a wide range of protocols, a great tool for net- 
work managers. By reading this article you will learn to 
setup and manage a VPN server PPTP based. 


20 PostgreSQL Partitioning (part 1) 

By Luca Ferrari 
In the previous articles the main features of PostgreSQL, 
including server-side programming were shown. In this ar- 
ticle a simple application scenario will be used to demon- 
strate the capability of partitioning huge amounts of data 
into different tables in different spaces transparently. 


Security 


34 DNSSEC Part 4: Securing DNS 
Transactions 
By Paul Ammann 
In the June 2012 issue, we outlined the threats, securi- 
ty objectives, and protection approaches for various DNS 
transactions. This article provides the steps involved in 
implementing those approaches, as well as operational 
best practices that go with those implementations. 
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MaheshaBSD 


Server Edition Has Been Just Released 


Many newcomers to FreeBSD find it difficult to setup their 
own FTP/WWW server quickly and, on the other hand, 
experienced users sometimes need to take precautions for 
unexpected crash situations — that is, to have a strategy 
for time economization and portability, as these two are 


valuable assets in our rushing world. 


What you will learn... 
+ How to runa simple and smart FTP/WWW server. 


USB flash drive with an operating system running 
Ae it answers many questions when portability and 
time effectiveness become indispensable. All you 
need to do is to plug your USB flash drive into the USB 
port. The computer’s operating system stays unaltered 
and geared up for any task it is used for. And USB flash 
drives boot several times faster than CD’s or DVD's. 
Readers who will only randomly find this article should 
know that MaheshaBSD and MaheshaBSD Server edition 
are two different things, although tightly related to each 
other. The article about MaheshaBSD was already pub- 
lished in the March 2012 issue of BSD Mag (http://bsd- 
mag.org/magazine/1795-nessus-exploitation-tools-and- 
payloads). 


The Story Behind MaheshaBSD Server 

A year ago | bought a second-hand IBM notebook and the 
experienced technician in the store there told me, after we 
discussed which operating system to use, that he would 
call me if he had a customer interested in an FTP/WWW 
server running FreeBSD. | welcomed the offer. However, 
| was surprised to learn that it was difficult for a proficient 
Linux user to setup an FTP server on FreeBSD. 

Some time later a friend asked me if | knew of any good 
FTP server software for Windows. Then another friend 
asked me to recommend him some software which coun- 
seling psychologists could use to share data. As many 
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What you should know... 


« Howto mount drives in FreeBSD. 
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Figure 1. /n the server edition of MaheshaBSD transliteration of 
Sanskrit works too (Seamonkey) 
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Windows users consider Unix to be the unknown ocean 
in which they do not want to dip their toes, my only choice 
was to look for Windows software. | looked on the Internet 
and was stunned by the prices people are willing to pay 
for Windows FTP server solutions. Some of them climb up 
even up to 500 US dollars. My head whirled round. Thus, 
the idea to make my own easy FTP/WWW server solution 
was born and finally MaheshaBSD Server edition slith- 
ered to day light — a smart and straightforward FTP/WWW 
server, something | was looking for. It runs off a USB flash 
drive and it does not require a Windows license. It is free 
for personal use. 

A decision to sell my FTP/WWW server solution has two 
goals: 


« A wish to donate some money to FreeBSD and 
OpenBSD (sorry | do not mention other BSD sys- 
tems here; | am only familiar with the two mentioned 
above). 

« | have a few friends in very poor countries and | 
would like to help them (Figure 1). 
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MaheshaBSD Server Edition Has Been Just Released 


Unique Features of MaheshaBSD Server 

All users, whether experienced or not, whether friends of 
Unix or not, will have an immediate possibility to operate 
a simple FTP/WWW server off a USB flash/hard drive. 
Portability and time effectiveness are vital assets also for 
experienced users. If you work in a small business and 
your hard drive or computer with a server system on it 
fails to boot one day (imagine a situation your boss does 
not want to invest money in an expensive IT infrastruc- 
ture), it may take several hours to buy a new hard drive, 
install your favorite server system on it, and configure 
it. 

USB flash drives are very fast and with MaheshaBSD 
Server you will get the operating system (FreeBSD) and 
a superbly easy FTP/WWW server into which you just 
copy files via SFTP. A prospect to use NTFS (and FAT32) 
disks/flash drives with write access and use them as a 
place for immediate FTP/WWW data storage will appear 
very practical not just for Windows users. Please do not 
ridicule me for the above statement, as for many peo- 
ple out there it is still much easier to use a spare NT- 
FS/FAT32 drive in Unix (without much knowledge about 
Unix) than to format it, as the data on it may have a val- 
ue and backing it up takes time too. Unfortunately, not 
many BSD LiveCD/USB projects support writing to NTFS 
drives. 


Unique features 
MaheshaBSD Server supports quotas. 
MaheshaBSD Server has a remote administration tool 
(Webmin; Figure 2). 
You can work with any hard disk (NTFS partitions too). 
MaheshaBSD Server (FTP/WWW/Webmin) is connect- 
able on aliased IP’s on LAN that all end with 200 and 


winscpton ax 
- Seer 
Host name: Post ramber: 
fis2.1681.200 | 2aj 
User name: Password 
Private key file: 
| = 
Protecet 
Fille protocet [SFTP =] Fe Allow SCP falback 
Select color 
[ Sdvanced optons 
et _| _teonon | _fer._| _ome_| 


Figure 3. To log in to Webmin, use any of the addresses specified 
above, as you see on the picture 
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Figure 4. To fetch your passwords, you must log in as user guest5 and 
not as user vsftpd, as you see on the picture 
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which all can be used as follows (depending on your LAN 
configuration): 10.0.0.200, 10.0.1.200, 192.168.0.200, 
192.168.1.200, 172.16.0.200, and 172.16.1.200 (Figure 
3). 
SSH works with the “boss” account only. This is a secu- 
rity measure. 


A Practical Example How To Use The 
MaheshaBSD Server's FTP/WWW Server 
The text is written also with focus on absolute beginners. 
Download the USB image from the following URL: 
ftp://2227.x.rootbsd.net/index. html. 
Unrar it and write it to your USB flash drive or USB hard 
drive. 
In FreeBSD: 


dd if=/path/MaheshaBSD9-server.img of=/dev/da0 
bs=10240 conv=sync 


In Windows use a program such as Winimage. 


C3 .snap 16. 6. 2012 14:... 

29.5. 2012 20:...  rwnnrexe-- 
Ciboss 22.5, 2012 1:1... rwnxrexe- 
Ciguest 27.1, 2012215... rwnerexe= 
Coquest: 22.5. 2012 1:5... rwnxrex 
Ciquest2 22,5, 2012 1:5... rwnxrexe= 
Coquests 22.$, 2012 1:5... rwnrexee= 
Ciquests 22.5. 2012 1:5... rwnrex- 
(quests 29.5. 2012 20:...  rwncrexe-> 
Cpublic_tmp 21.5. 2012 23:...  manqworwx 
Civsftpd 29.5. 2012 203... rwnerexeee 


Figure 5. After you log in, you will see all the user directories in /home 
with one shareable directory (public_tmp) 


2 
Loc Mark Filet Commands Session Cptone Remote fate 
@ig- BSZe EF Sa * ¥ 4 S| tee = G+ ee owe ts GA 
at eomees AS ~ GID 1D | te | peees *Siees- (UGS 3\% 


Wem 

2S mtr 1s 
ree 2S motte 
mm Sh mre wore 
eS 20S erie ewe 
mt ME mie 
1 mS mre rece 


mS mie mee 
™m Bt me 
= DS mize rere 


poet m 6 ele 


Figure 6. To see your passwords, first click on the guest5 directory and 
then on passes.txt 
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When you boot your computer with MaheshaBSD Serv- 
er, simply use programs such as Winscp for Windows (on- 
ly SFTP protocol works), or SFTP in Unix, and fetch your 
passwords (log in as guest5): 


¢ Login: guest5 
e Password: guest6 


Then just log in to your vsftpd account (via SFTP) and 
copy anything to the vsftpd > ftp directory (Figure 4-6). 

Of course, you can do all of the above steps also with 
physical access to the computer where MaheshaBSD 
Server is running. When you boot your computer off your 
USB flash/hard drive, you will see the password for root 
in a blue text. Then just log in and type (Figure 7 and Fig- 
ure 8): 
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Figure 7. All you need is to copy any file to the vsftpd > ftp (or apache 
> www) directory to have it immediately displayed in your browser 
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Figure 8. We chose the file winimage.exe as an example here; the file 
is immediately displayed in the MaheshaBSD Server’s FTP server 


ad | x 


This page is used to test the proper operation of the FreeBSD \aheshalbSD 
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more /home/guest5/passes.txt in your shell 


| also tested this with a NTFS drive redirected via 
mount_nullfs and | copied a 700 MB file onto it without 
any problems. 

All other details, if you are more interested in this proj- 
ect, are in the documentation available at: http:/Avww. 
freebsd.nfo.sk/MaheshaBSDserverManual.pdf. 


Conclusion 
Groups tend to live inside themselves and often without 
knowledge of other waters. Thus, the MaheshaBSD Serv- 
er’s goal is not to be penchant for one group only. To at- 
tract new users to the BSD world requires giving them 
some of the water they were accustomed to bathe in. If 
a child sets up IP Forwarding in its router and will easily 
start its own WWW/FTP server from home, this may be its 
first challenge to look at FreeBSD. To easily share data 
can be the same challenge for scholars, teachers, small 
businesses, and actually for all Windows users, too, if 
they happen to ask questions where to find a simple FTP/ 
WWW server solution. Not many non-Windows choices 
(although usable with Windows) await them on their mar- 
ket with FTP/WWW servers. | made this work also with 
hope that skilled Linux users (like the technician men- 
tioned in the beginning of this article), who do not know 
how to set up an FTP/WWW server with FreeBSD, might 
be also challenged to look at this project. 

| thank http:/www.rootbsd.net for allowing me to distrib- 
ute MaheshaBSD and MaheshaBSD Server. 


JURAJ SIPOS 

Juraj lives in Slovakia and he works in a library in an educa- 
tional institute. Some time in the past he was fortunate to trav- 
el around the world and he spent a bit of time in India and Aus- 
tralia. Juraj’s hobbies are computers, mostly Unix, but spiritual- 
ity too. His first published computer article was Xmodmap How- 
to (http://tldp.org/HOWTO/Intkeyb/). In addition to computers, 
he is very interested in Hinduism but not really the guru side of 
things, but more-so freedom and self-actualization. More at his 
website: 

http://www.freebsd.nfo.sk/ (FreeBSD) 
http://www.freebsd.nfo.sk/maheshaeng.htm (MaheshaBSD) 
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BSD Certification 


The BSD Certification Group Inc. 
(BSDCG) is a non-profit organization 
committed to creating and 
maintaining a global certification 
standard for system administration 
on BSD based operating systems. 


@ WHAT CERTIFICATIONS ARE AVAILABLE? 


BSDA: Entry-level certification suited for candidates 
with a general Unix background and at least six months of 
experience with BSD systems. 


BSDP: Advanced certification for senior system administrators 
with at least three years of experience on BSD systems. 
Successful BSDP candidates are able to demonstrate 

strong to expert skills in BSD Unix system administration. 


Q WHERE CAN | GET CERTIFIED? 


We're pleased to announce that after 7 months of 
negotiations and the work required to make the exam 
available in a computer based format, that the BSDA 
exam is now available at several hundred testing centers 
around the world. Paper based BSDA exams cost $75 USD. 
Computer based BSDA exams cost $150 USD. The price of 
the BSDP exams are yet to be determined. 


Payments are made through our registration website: 
https://register.bsdcertification.org//register/payment 


@ WHERE CAN | GET MORE INFORMATION? 


More information and links to our mailing lists, Linkedin 
groups, and Facebook group are available at our website: 
http://www.bsdcertification.org 


Registration for upcoming exam events is available at our 
registration website: 
https://register.bsdcertification.org//register/get-a-bsdcg-id 
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Tuning ZFS on FreeBSD 


ZFS is amodern 128-bit file system based on the copy-on-write 
model. It originates from the OpenSolaris project and has first 
appeared in FreeBSD in 2008. ZFS has many innovative features 
including an integrated volume manager with mirroring and 
RAID capabilities, data checksumming and compression, writable 
snapshots that can be transferred between systems and many 


more. 


What you will learn... 
+ how to optimize ZFS for various applications and workloads 
+ how to measure and evaluate ZFS cache efficiency 


tions including sysctl(2) knobs and give examples how 

can ZFS performance and efficiency can be measured 
and evaluated. This article is intended for FreeBSD users 
with ZFS version 28 available since 8.3-RELEASE and 
9.0-RELEASE. 

ZFS has a lot of tuning options accessible via the sys- 
ctl(8) command. In addition, the ZFS part of the Open- 
Solaris kstat (kernel statistics facility) framework has also 
been made available on FreeBSD providing raw statistical 
data in form of various counters. There are more than 60 
vfs.zfs knobs and over 80 kstat.zfs knobs providing ac- 
cess to miscellaneous kernel counters, state and sizing 
variables. There is currently only a very limited amount of 
tools available that process this information. 

The first sections of this article are going to focus on 
tuning zfs prefetch and caches, introducing the zfs-stats 
and zfs-mon statistics processing tools for measurement 
and evaluation. They are followed by individual tuning tips 
like using ZFS on web, database or file servers, and opti- 
mizations for the advanced format (4k sector) drives. 


| n this article | am going to discuss several tuning op- 


General Tuning Tips 

RAM memory 

The amount of system RAM has a significant impact on 
ZFS performance, especially if using deduplication. Many 
issues can be cured by increasing system’s RAM memory. 


5 
g 
g 
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What you should know... 
+ ZFS system administration basics 
+ working with sysctl(8) and loader(8) tunables 


The recommended memory minimum is 1GB, but | sug- 
gest for highly utilized setups at least 8GB of system 
RAM. For server use error correcting ECC memory is an 
advantage. 


Access time 

On a FreeBSD system, every time a file is accessed its 
access time (atime) gets updated. This may generate a lot 
of disk write activity on servers working with a large num- 
ber of files. In such a case you might want to disable atime 
(access time) for the affected datasets or for the whole 
pool. Turning off atime may improve performance of on all 
types of application servers. 


# zfs set atime=off dataset 


Dataset compression 

Using ZFS dataset compression saves space but has a 
negative effect on system’s CPU performance and re- 
sponsiveness. On the other hand, enabling compression 
(mainly LZJB) may increase your data throughput, espe- 
cially for slow storage. In particular the gzip compression 
costs essentially more CPU time then the less-compress- 
ing LZJB compression. Therefore | recommend using da- 
taset compression only if you have compressible data and 
the dataset is not a performance bottleneck or if you are 
generally low on storage space. Datasets with low activity 


08/2012 


containing e.g. log files are good candidates for gzip com- 
pression. If you have fast storage with enough space and 
need top-notch performance, disable dataset compres- 
sion for the affected datasets. 


# zfs set compression=[on|off] dataset 


Deduplication 

ZFS deduplication is a relatively new feature that enables 
you to save space by keeping a single copy of data that 
is available on your ZFS dataset in multiple copies. De- 
duplication requires a large amount of RAM memory. The 
ideal situation is if your whole deduplication table fits into 
memory otherwise you may experience decreased sys- 
tem performance. Deduplication can be enabled and/or 
disabled on a per-dataset basis: 


# zfs set dedup=on 


You can view (detailed) deduplication information about 
a pool using the zdb command: 


# zdb -D pool 
or 
# zdb -DD pool 


It is possible to simulate the effect of enabling dedupli- 
cation for a pool. To verify possible space gains, a re- 
sulting deduplication ratio of more than 2.00 indicates a 
good candidate for deduplication: 


# zdb -S pool 


A discussion of the benefits and costs of ZFS dedupli- 
cation is available at the blog of Constantin Gonzales 
(Attp:/constantin.glez.de/blog/2011/07/zfs-dedupe-or-not- 
dedupe). 


ZFS send/receive 

If you are using send and receive in the same run (if you 
are not streaming into a file but piping zfs send directly to 
zfs receive) then you should consider using a buffering so- 
lution to speed up the process. | personally recommend the 
“mbuffer” program available in the FreeBSD ports tree as 
misc/mbuffer. Mbuffer allows you to buffer local streams (via 
a pipe) and is capable of standalone network streaming. 


# zfs send tank/a@s1 | mbuffer -m 128M | zfs receive 
tank2/a@s1 
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Cache and Prefetch Tuning 

Adaptive Replacement Cache (ARC) 

One of the primary tunable ZFS features is the memo- 
ry-based Adaptive Replacement Cache (ARC). Data and 
metadata of blocks read from disk devices are stored in 
this cache. ARC provides a major speedup to ZFS opera- 
tions and is enabled by default. 

The main loader(8) tunables for ARC are: 


* vfs.zfs.arc max: Maximum ARC size (in bytes) 
* vfs.zfs.arc_ min: Minimum ARC size (in bytes) 
* vfs.zfs.arc meta limit: ARC metadata limit (in bytes) 


With these tunables you can control the size limits of the 
ARC cache on your system. The minimum and maxi- 
mum values for ARC are automatically sized on system 
boot, depending on the installed RAM memory. The de- 
fault values are: 


° vfs.zfs.arc_ max: physical RAM less 1 GB (or vm.kmem _ 
size, whatever is smaller) 

° vfs.zfs.arc_meta_ limit: 1/4th of arc _ max 

half of arc meta _ limit (equals to 


vis.zfs.arc min: 


1/8th of arc = max) 


Both arc_ min and arc_ max have a hard-coded minimum 
of 16MB and both arc meta _ limit and arcmin May not 
be higher then arc_ max. 

To display your current ARC size (in bytes), run: 


# sysctl kstat.zfs.misc.arcstats.size 


Default values should be sufficient for most users, as if 
your system requires memory for other tasks, the ARC 
memory is automatically freed, shrinking down to the 
arc min value. Changing vis.zfs.arc max is of advan- 
tage only if you need explicitly reserved memory for oth- 
er use. Alternatively there may be situations where your 
metadata cache gets filled up (e.g. you have lots of small 
files) and you observe bad performance of your ARC. In 
such a situation you may want to increase the arc_ meta _ 
1imit (loader(8) tunable). Please note that arc min iS en- 
forced to be at least the half of arc meta limit. 

To display your current ARC metadata usage and meta- 
data limit (in bytes), run: 


# sysctl vfs.zfs.arc_meta_used 


# sysctl vfs.zfs.arc_meta_ limit 


By default, ARC is enabled for all datasets. If you ex- 
perience performance problems because of full and 
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inefficient ARC you can decide to disable ARC or limit it 
to cache only metadata on non-critical datasets: 


# zfs set primarycache=[all|metadata|none] dataset 


For displaying uptime and real-time ARC activity and ef- 
ficiency, please refer to the “zfs-stats and zfs-mon” sec- 
tion of this article. 


Level 2 Adaptive Replacement Cache (L2ARC) 
Fast block devices (e.g. SSD drives) can be used to ex- 
tend the ARC cache of specific ZFS pools. This cache is 
called Level 2 Adaptive Replacement Cache (L2ARC). 
This cache type is optimal mainly in write-once and read 
many storage scenarios, e.g. the static content of inter- 
net websites with image and video files that do not get 
overwritten (but the total space used keeps growing). It 
requires fast cache devices (use of fast SSD drives is rec- 
ommended). On the contrary to ARC being shared for the 
whole system, L2ARC devices are pool-bound and cache 
only the data of the pool they are attached to. 

The size of L2ARC cache is defined by the size of the 
cache device(s). To add/remove cache devices from a 
pool, you can use the following commands: 


# zfs add pool cache device 


# zfs remove pool device 


L2ARC provides several system tunables, | am going to 
explain the following: 


vfs.zfs.1l2arc_feed_again: turbo warmup 
vfs.zfs.l2arc_feed_secs: interval secs 
vfs.zfs.1l2arc_write_max: max write size 
vfs.zfs.l2arc_write_boost: extra write during warmup 
vfs.zfs.1l2arc_headroom: number of dev writes to precache 


vfs.zfs.l2arc_noprefetch: don’t cache prefetch bufs 


The first tunable above enables or disables L2ARC turbo 
warm-up. The turbo warm-up phase happens between sys- 
tem bootup and the L2ARC cache getting "warm" (= the 
first time L2ARC evicts data). During this phase the ARC 
write speed is calculated as i2zarc_write max + 
write boost in bytes every 12arc_ feed _ secs seconds. If the 
warm-up phase is disabled or L2ARC is already in warm 
state, data is written at a maximum speed of i2arc_ write _ 
max bytes every i2arc_ feed _ secs Seconds. Default settings 
are warm-up enabled, 12arc_ feed__ secs Set to one second 
and both l2arc write max and i2arc _ write boost set to 
8MB. The i2arc_ headroom tunable defines the number of 
L2ARC writes to be precached. The default value is 2. 
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The goal of these settings is to avoid overwriting SSD 
devices too quickly as these have a limited number of 
overwriting cycles. These settings have been defined in 
2008 and modern SSD drives can easily operate at faster 
speeds. There are recommendations to at least double 
the vis.zfs.l2arc write max and vis.zfs.1l2arc write boost 
loader(8) tunables to 16MB. 

The i2arc_noprefetch tunable is set to 0 by default. This 
disables caching of sequential reads. Enabling this set- 
ting (value of 1) does in many cases improve the L2ARC 
performance, e.g. for video streaming or web-serving of 
large files. 


# sysctl vfs.zfs.l2arc_noprefetch=1 


By default, L2ARC is enabled for all datasets. You can 
disable L2ARC or limit it to cache only metadata as a 
per-dataset setting: 


# zfs set secondarycache=[all|metadata|none] dataset 


For displaying uptime and real-time L2ARC activity and 
efficiency, please refer to the “zfs-stats and zfs-mon” 
section of this article. 


ZFS Intent Log (ZIL) 

The ZFS Intent Log provides synchronous semantics for 
ZFS. It is used to guarantee data consistency on fsync(2) 
calls. It is used to replay data transactions in case of a ker- 
nel panic, hardware or power failure. 

By default, the ZFS Intent Log takes up a small amount 
of storage space of each ZFS pool. To speed up synchro- 
nous writes, a separate log device (including a mirrored 
device) may be used. Fast SSD drives are recommended 
for this purpose. 

To add or remove a log device to a pool, use the follow- 
ing commands: 


# zpool add tank log device 


# zpool remove tank device 


It is possible to configure the synchronicity behaviour on 
a per-dataset setting: 


# zfs set sync=[standard|always|disabled] dataset 


By setting sync to disabled, data is written to storage on- 
ly on periodical (TXG — Transaction Group) write times. 
This improves write performance but introduces the risk 
of losing data during a kernel panic, hardware or power 
failure and makes this option interesting only for tempo- 
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rary and volatile data. Setting sync to always has a large 
performance penalty. 


File-level Prefetching (zfetch) 

The file-level prefetching mechanism implemented in ZFS 
is named “zfetch”. This mechanism analyses read patterns 
of files and tries to predict next reads resulting in reduction 
in application response times. In some workloads, zfetch 
may be CPU-intensive and limit scalability. The efficiency 
of prefetch can be displayed and monitored with the “zfs- 
stats” and “zfs-mon” tools (discussed later). Zfetch is en- 
abled by default (disabled on systems with less than 4GB 
of RAM) and you may disable or re-enable it by setting the 
following loader tunable to 1 (disable) or 0 (enable): 


vfs.zfs.prefetch_disable: Disable prefetch 


Device-level Prefetch (vdev prefetch) 

The vdev prefetch mechanism does pre-read data af- 
ter small reads from pool devices. Currently, the vdev 
prefetch cache is disabled by default. Long-term use has 
shown that it is inefficient in most cases and its memory 
consumption is proportional to the number of vdevs on 


Listing 1. Example output of “zfs-stats” 


ARC Size: WISIN Leis Gils} 
Target Size: (Adaptive) P2822 25. GiB 
Min Size (Hard Limit): 12.503 4.00 GiB 
Max Size (High Water): eal 32.00 GiB 

ARC Efficiency: 20 
Cache Hit Ratio: 90.52% 13b 
Cache Miss Ratio: 9.48% 18.08m 


Actual Hit Ratio: 


Data Demand Efficiency: 


Data Prefetch Efficiency: 


L2 ARC Breakdown: 18.18m 
FaveaRaltdor 62.87% 74.29m 
Miss Ratio: ohsloe 23..89m 
Feeds: 849.64k 

File-Level Prefetch: (HEALTHY) 

DMU Efficiency: 28.09b 
Hit Ratio: 88.54% 24.87b 
Miss Ratio: 11863) S.225 
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a system. There are user reports that re-enabling vdev 
cache may significantly speed up the scrub (and resilver) 
process on raid-z devices (or systems with slow drives) 
by reducing disk seek times and speeding up metadata 
reads. 

The vdev prefetch is primarily controlled by the vfs.zfs. 
vdev.cache.size loader(8) tunable that contains the per- 
vdev cache size in bytes and is disabled by default (value 
of 0). For expreienced users, there are additional tunables 
for fine-tuning available under the vis. zfs.vdev.cache load- 
er(8) tunable group. 


# sysctl -d vfs.zfs.vdev.cache 


To enable vdev prefetch, set vés.zts.vdev.cache in loader. 
conf(5) to a desired size in bytes different from Zero, e.g. 
previous default value of 10 megabytes: 


vfs.zfs.vdev.cache.size=10485760 


zfs-stats and zfs-mon: ZFS Statistics Tools 

The xstat.zfs sysctl(8) knobs provide access to many ZFS 
counter variables. These variables contain raw data and 
to make any conclusions from these variables, interme- 
diate values need to be computed. The perl scripts zfs- 
stats and zfs-mon do process this data and provide hu- 
man-readable output. The zfs-stats tool is based on Ben 
Rockwood’s arc_summary.p1 and includes modifications by 
Jason J. Hellenthal and myself. Both tools are available in 
the FreeBSD ports tree as sysutils/zfs-stats. The data in 
zfs-stats summarizes and/or averages counters that col- 
lect data since the system was booted. Example output 
excerpt from zfs-stats: Listing 1. 

The uptime averages do not tell much about the actu- 
al system performance. To view real-time cache efficien- 
cy (or raw numbers) | have written the “zfs-mon” tool. It 
monitors ARC, L2ZARC and Zfetch in real time and outputs 
10s, 60s and total per second averages (total = since the 
program was started). 

Example “zfs-mon -a” output after collecting 120 sec- 
onds of data: Listing 2. 

As of total cache efficiency, the L2ARC cache is ac- 
cessed only on anARC miss, so your total cache efficiency 
is calculated using the following formula: [arc efficiency] 
+ (100-[ARC efficiency])*([L2ARC efficiency]/100). 

Result for the example above: 
89,96)*(71,15/100) = 97,10. 


89,96 + (100- 


Interpreting the output from zfs-stats and zfs-mon 
The output of zfs-stats and zfs-mon may help you to 
discover bottlenecks and decide to change some de- 
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fault values. The main values to look at are usage and 
efficiency of various caches. An efficiency value of 
100% means all reads are done from the cache, 0% 
means all reads are done from disks. For my uses, ef- 
ficiency above 80% counts as efficient and above 90% 
counts as highly efficient. Remember, that L2ARC 
takes some time to warm-up and is intended to improve 
your total cache efficiency. If using zfs-mon, try to col- 
lect data for a longer period of time and watch the “tot” 
column. 


Here are some general tips: 
Inefficient ARC data cache: 


* if you have limited the ARC size, increase or remove 
the limit 
* disable ARC for some datasets 


Listing 2. Example output of ’zfs-mon -a” (runtime 120 seconds) 


ZFS real-time cache activity monitor 


Seconds elapsed: 120 


Cache hits and misses: 


ARG hattse 259 431 418 466 

ARC misses: oe 40 49 oe 

ARC demand data hits: 223 417 390 437 

ARC demand data misses: 36 20 1) 16 

ARC demand metadata hits: 36 i 25 25 
ARC demand metadata misses: 1S) 13 eae 5) 
ARC prefetch data hits: 0 4 3 4 

ARC prefetch data misses: 0 10 8 
ARC prefetch metadata hits: 0 0 0 0 
ARC prefetch metadata misses: 0 0 i 3 
L2ARC hits: 47 34 40 37] 

L2ARC misses: 4 5 9 15 

ZFETCH hits: 47903 47294 48155 47138 

ZFETCH misses: ote 449 1147 3593 


Cache efficiency percentage: 
10s 60s tot 


INES Mike Gih Lal ILCs) 

ARC demand data: 95.42 95.82 96.47 

ARC prefetch data: 80.00) 23.08 33.33 

ARC prefetch metadata: 0.00 0.00 0.00 
MZARC SO puCmemo lao Sme i leas 

ZERTCH e O9n OG SEG 92,92 
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* considering lowering the ARC metadata limit 
* add more RAM to your system 
* consider using additional L2ARC cache devices 


Inefficient ARC metadata cache: 


* consider increasing the ARC metadata limit 
* add more RAM to your system 


Inefficient L2ARC cache: 


e this depends very much on the structure of your 
reads 

¢ if your ARC is already very efficient, L2ARC might 
sometimes add only little advantage 

¢ if your ARC is inefficient, too, consider increasing 
system memory and L2ARC 

* in some scenarios L2ARC efficiency of 30-40% may 

already be acceptable 


Inefficient ZFETCH: 
* consider disabling zfetch 
Inefficient vdev prefetch: 


* consider disabling vdev prefetch 

¢ if enabled, scrub and resilver may run substantially 
faster 

* modify advanced vdev prefetch settings (experts on- 


ly) 


Tuning ZFS for Applications 
Webservers 
On FreeBSD, user experience has shown that it is an ad- 
vantage to disable sendfile and mmap on your webserv- 
ers, if you are serving your pages from ZFS datasets. 
Otherwise your data may get cached in your memory 
twice and this reduces your system memory. 

Here are example configuration directives for popular 
webservers: 


Apache 
EnableMMAP Off 


EnableSendfile Off 


Nginx 
Sendfile off 


Lighttpd 


server.network-backend = "writev" 
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Database servers 
For databases like PostgreSQL and MySQL, users rec- 
ommend to store them on a dataset created with a differ- 
ent recordsize than the default of 128 kilobytes. 

For PostgreSQL and MySQL (MyISAM storage), set the 
recordsize to 8 kilobytes before populating the dataset: 


# zfs create -o recordsize=8k tank/mysgl 


For MySQL InnoDB storage files (not logs) set the re- 
cordsize to 16 kilobytes, logs might be left at the default 
record size (you need to spread the data over more da- 
tasets that have different recordsizes). 


NFS servers 

If sharing ZFS datasets on NFS servers with a lot of writes, 
the ZFS Intent Log (ZIL) might be your bottleneck. To im- 
prove performance, you may want to move ZIL to sepa- 
rate log devices (fast SSD drives) or try to disable ZIL for 
the affected datasets. Disabling ZIL may cause NFS client 
corruption. 


# zfs set sync=disabled dataset 


ZFS and 4k Sector Drives 

Originally hard drives used to store data in 512 byte physi- 
cal sectors. Today’s large harddrives use the Advanced 
Format (4k sectors) and provide a 512 byte-sector com- 
patibility mode. On FreeBSD, the mentioned drives usu- 
ally still report a 512 byte sector size: 


adaQ: 2861588MB (5860533168 512 byte sectors: 16H 63S/T 16383C) 


ZFS uses this value when defining the block size for de- 
vices when added to a pool (or when a pool is created). 
To view your pool device configuration, use the following 
command: 


# zdb -C [poolname] 


The parameter “ashift” describes the ZFS block size 
used size as 2 * [ashift]. A value of “9” means 512 byte 
sectors. To have 4 kilobyte blocks, a value of “12” is re- 
quired. 

Users report poor performance with Advanced Format 
hard drives and ahift=9, especially in raidz configurations. 
To create a pool optimized for 4 kbyte sectors, we have 
to have to make the ZFS block size match the physical 
sector size do some tricks with a fake gnop device. Let’s 
assume we want to create a new pool with the /dev/adao 
device: 
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Tuning links 
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_ 
Tuning_Guide 


Technical links 

+ — http://dtrace.org/blogs/brendan/2012/01/09/activity-of-the- 
zfs-arc 
http://dtrace.org/blogs/brendan/2008/07/22/zfs-l2arc 
https://blogs.oracle.com/roch/entry/tuning_zfs_recordsize 
https://blogs.oracle.com/roch/entry/dedup_performance_ 
considerations! 
http://constantin.glez.de/blog/2011/02/frequently-asked- 
questions-about-flash-memory-ssds-and-zfs 
http://ivoras.net/blog/tree/2011-01-01.freebsd-on-4k-sector- 
drives.html 


gnop create -S 4096 ada0 
zpool create tank ada0.nop 
zpool export tank 

gnop destroy ada0.nop 


zpool import tank 


SE OSE OE ORE HE 


zdb -C tank | grep ashift 


On a 4k block setup, small files up to 4k always take 
one whole block. ZFS metadata is many times smaller 
than 4kb. Please consider that using ashift=12 increas- 
es the initial space required for metadata by a fairly large 
amount (about 5% of your total disk space). Depending 
on your data, this overhead may increase on filling the 
pool with data (e.g. many small files). So this is effective- 
ly a tradeoff between performance and free space and 
you have to decide which is more important. 


Conclusion 

Default values of ZFS settings are intended to suit the av- 
erage user. This article presented several ways how to 
optimize a system for ZFS and how to tune these values 
for specific workloads. The evaluation tools zfs-stats and 
zfs-mon provide necessary measurement data. 

ZFS is a great piece of software and | use it heavily on 
dozens of systems in combination with FreeBSD, Open- 
Indiana and even Linux. The lack of measurement and 
evaluation tools have inspired me to work on zfs-stats and 
zfs-mon. 
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MPD5 


VPN Server with FreeBSD Setup and Management 


Mpds5 is a fast, flexible and secure way to make VPN 
connections on FreeBSD. It requires very few resources and 
supports a wide range of protocols, a great tool for network 


managers. 


What you will learn... 
+ In this paper we will learn to setup and manage a VPN server PPTP 
based. 


ly suitable for those who need to grant VPN ac- 
cess to external consultants, the so-called “Road 
Warriors”. The operating system is FreeBSD version 
8.2 release, but the configuration is also valid for ver- 
sion 9.x. The internal network will use the address fam- 


T he package we will use is mpd5 and is particular- 


What you should know... 
+ Basics about how to compile BSD Kernel and basic BSD Networking 
Setup 


ily 10.0.0.0/255.255.255.0 and the internal address of the 
BSD firewall will be 10.0.0.1. 

The goal is to setup and manage a number of connec- 
tion links between the PPTP VPN server and external 
users, simultaneously, by using the Netgraph implemen- 
tation of Point-to-Point Tunneling Protocol (PPTP), a sys- 


Stee LAN A 


Figure 1. Mpd5 VPN PPTP connections examples 
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tem used to implement virtual private networks that uses 
a control channel over TCP and a GRE tunnel operating 
to encapsulate PPP packets. 


What is the Netgraph System 

Netgraph is the graph based kernel networking subsys- 
tem and provides an uniform and modular environment 
to implement kernel objects which perform various net- 
working functions, known as nodes. The nodes can be 
assembled in particular structures, so-called graphs, 
whose edges are formed by the nodes “hooks” and are 
used to manage data flows. With netgraph one has a 
flexible and modular implementation to manage con- 
nections and both protocol and network layer, using a 
fast kernel based architecture. Mpd5 can serve several 
Gigabits per second of PPP traffic with right hardware, 
allowing many thousands of simultaneous connections 
without important performance degradation, moreover 
it needs very low resources, finally it supports many of 
existing PPP link types like modem for asychronous se- 
rial connections, pptp for the Point-to-Point Tunnelling 
Protocol (PPTP), l2tp for Layer Two Tunnelling Protocol 
(L2TP), pppoe for Ethernet connections with the PPP- 
over-Ethernet (PPPoE) protocol, tcp/udp to tunnel PPP 
session over a TCP/UDP connection. With different con- 
figurations mpd5 is able to run as a PPP client or server 
and supports many PPP protocols extensions, such as 
MS-CHAP and EAP authentication, it uses traffic encryp- 
tion like MPPE and includes many additional features 
like Network address translation (NAT) support, telnet 
and http control interfaces, IPv4 and IPv6 protocols sup- 
port, and different authentication and accounting meth- 
ods support such as RADIUS, PAM, script, or file. 


Installing mpd5 
One can install mpd5 using ports or binay packages 


cd /usr/ports/net/mpd5 


make install 
pkg_add -rv mpd5 


Note 
Mpd5 documentation is installed in HTML and PostScript 
format into /usr/local/share/doc/mpd. 


Configuring mpd5 

The mpd5 configuration folder is in /usr/1ocal/etc/mpds, 
and its main configuration file is mpa.cont. For the network 
previously shown, here is an example configuration file: 
Listing 1. 


www.bsdmag.org 


MPD5 - VPN Server with FreeBSD Setup and Management 


Listing 1. The mpd5 core configuration file 


root@freeB 


sd82-fw.virtual.test/usr/local/etc/mpd5#cat 
mpd.conf 


HEPTHHAAPEAAP RAPE AAEE EAP ERATE EEA 


# 


MPD configuration file # 


HEPTHHEEPEAAPER AEE PE EERE PETAR AEE ERA EEE EEE 


SicaGEUp!s 
# Disablin 
set consol 


set web cl 


default: 
lo 


pptp_serve 


set 


CE 


se 


se 


se 


se 


se 


se 


g web and console access 
e close 


ose 


ad pptp_server 


digs 

ippool add poolsat 10.0.0.200 10.0.0.220 
eate bundle template B 

t iface enable proxy-arp 

t iface idle 0 

t iface enable tcpmssfix 

t ipcp yes vjcomp 

t ipcp ranges 10.0.0.254/32 ippool poolsat 
E ipep! dns 1020-021 


# Enabling Microsoft Point-to-Point encryption (MPPE) 


se 


se 


se 


t bundle enable compression 


t ccp yes mppc 


t mppc yes compress e40 e56 e128 stateless 


# Creating clonable link template named L 


ots 
# Setting 
se 
# Using Mu 
se 
se 
se 
se 


se 


eate link template L pptp 

bundle template to use 

t link action bundle B 

ltilink for giving full 1500 MTU 
t link enable multilink 


t link yes acfcomp protocomp 


t link no pap chap eap 
t link enable chap 


t link enable chap-msv2 


# Reducing link mtu to avoid GRE packet fragmentation. 


se 


se 


# Configuri 


se 


440 
t link keep-alive 10 60 


t link mtu 


ng PPTP and open link 


t pptp self 0.0.0.0 # So configured,daemon 


listens on all interfaces 


# Allowing to accept calls 


se 


t link enable incoming 
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In the above example configuration, we specified the 
following fundamental settings: 


° set ipcp ranges <mpd-ip-address>/<mask> ippool poolsat: 
the ip address that our mpd5 daemon is listening on, 
in the subnet mask of our internal network. 

° set ipep dns <dns-server>: the DNS server IP address. 

* set pptp self <mpd-ip-address>: Another time this is the 
ip address on which our mpd5 daemon is listening on 


° set ippool add poolstat <ip-rangel> <ip-range2>: the IP 
Pool reserved for PPTP connections. 


Now we must add user accounts that will be autho- 
rized to connect to our PPTP server, they are stored in 
the /usr/local/etc/mpd5/mpd.secret file. This file contains 
username-passwords pairs that are used to authorize 
a user, separated by a whitespace, a particular use 
provides a fixed ip address to a particular username, 


Listing 2. MPD credentials configuration file 


HEH EEE HE HE EEA EEE HE EE EET A EEE EE EE EE EE 


# MPD secrets configuration file # 
Hetetteeee ded ded ted tae tae dt aad ted ted bed tad a aa ed ae eA BE 
user 2012 user 2012 


Osan? 2014 wees 202 N00 06222 


# An external password access program 


user3_2012 “!/usr/local/etc/mpd5/passwd_get.sh” 


Listing 3. An mpd basic firewall ruleset to add to /usr/local/etc/ 
ipfw.rules 


## ipfw firewall ruleset for mpd 


# Macros 


fwemd=/sbin/ipfw 


# Allow Mpd pptp traffic 

Sfwemd add 0554 allow tcp from any to any dst-port 1723 
Sfwomd add 0555 allow tcp from any 1723 to any 

Sfwemd add 0556 allow gre from any to any 


# MPD Vpn Rulesets 


Sfwomd add 0671 allow gre from any to any 

Sfwemd add 0672 allow a from any to any via ng0 
Sfwemd add 0673 allow a from any to any via ngl 
Sfwomd add 0674 allow a from any to any via ng2 
Sfwemd add 0675 allow a from any to any via ng3 
Sfwomd add 0676 allow a from any to any via ng4 
Sfwemd add 0677 allow a from any to any via ng5 
Sfwomd add 0678 allow a from any to any via ng6 
Sfwomd add 0679 allow a from any to any via ng7 
Sfwomd add 0680 allow a from any to any via ng8 


Listing 4. The FreeBSD Netgraph Kernel Modules 


root@freeBsd82-fw.virtual.test/home/utente#kldstat 


Id Refs Address Size Name 


0xc0400000 c67484 
0xc3133000 4000 
0xc3137000 b000 
Oxc31cd000 4000 
Oxc31e2000 7000 


kernel 
ng_socket.ko 
netgraph.ko 


ng_iface.ko 


oO 8 Ww NH BE 
PrP WP © 


ng_ppp.ko 


Listing 5. The FreeBSD Netgraph Kernel Configuration Options 


# NETGRAPH KERNEL SUBSYSTEM 


options NETGRAPH #netgraph (4) system 
options NETGRAPH_ ASYNC 
options NETGRAPH_ BPF 

options NETGRAPH ECHO 
options NETGRAPH ETHER 
options NETGRAPH FRAME RELAY 
options NETGRAPH_ HOLE 
options NETGRAPH_IFACE 
options NETGRAPH KSOCKET 
options NETGRAPH_ L2TP 
options NETGRAPH_ LM 


# MPPC compression requires proprietary files (not included) 


#options NETGRAPH MPPC_COMPRESSION 
options ETGRAPH MPPC_ENCRYPTION 
options ETGRAPH_ONE2MANY 
options NETGRAPH_ PPP 

options ETGRAPH_ PPPOE 

options NETGRAPH_PPTPGRE 

options ETGRAPH_RFC1490 

options NETGRAPH SOCKET 

options ETGRAPH TEE 

options ETGRAPH TTY 

options NETGRAPH_UI 

options ETGRAPH VJC 
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the following is a good example: Listing 2. The start- 
ing “!” means that the password for user user3_ 2012 
is not stored in the mpd.secret file directly, but will be 
get by using the command /usr/iocal/etc/mpd5/passwd_ 
get.sh user3_2012, for example obtaining it from a data- 
base. This system allows sysadmin to print the plain- 
text password for the named user as a single line to 
standard output, and then exit, allowing mpd5 to inter- 
cept this operation. If there is an error, the command 
should print what is the matter, helping in troubles res- 
olution. 

Another special case is the username “*” in the mpd.se- 
cret file, particularly this line must be the last of the file and 
will match any username,using external programs like the 
previous passwd _get.sn to check a valid username. This 
wildcard matching only works for “!” lines. 

It’s basic that the total length of the executed command 
is less than 128 characters. Finally, it’s important to know 
that any additional arguments generated by the scripts will 
be visible to users on the local machine running ps. 

A good security practice is to limit access to the configu- 
ration folder as follows: 


# chown -R root:root /usr/local/etc/mpd5 
# chmod -R 0600 /usr/local/etc/mpd5 


Allowing mpd5 Traffic Through your Firewall 
with ipfw 

The standard firewalling system used in FreeBSD is iptw. 
If we are using mpd5 in a firewalled environment, we must 
pass the traffic through the firewall for permitting clients to 
connect to the PPTP server and our internal network, and 
this is a real example: Listing 3. 


Starting mpd5 
To allow mpd5 daemon to start at boot time, we must add 
these lines to our /etc/rc.cone file: 


# Host acts like a router 


gateway enable="YES” 


# MPDS 
mpd_enable="YES” 
mpd_flags="-b -s mpd5” 
arpproxy all="YES” 


And then restart network subsystem and mpd5 daemon, 
so our clients will be able to connect to it. 


# /etc/netstart 
# /usr/local/etc/rc.d/mpd5 start 
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On the Net 

+ Netgraph: http:/www.freebsd.org/cgi/man.cgi?query=netgraph 
&sektion=4 
Mpd5 Documentation: http://mpd.sourceforge.net/**doc5/ 
http://mpd.sourceforge.net/doc5/ 
FreeBSD kernel compiling: http://www.freebsd.org/doc/en_ 
US.ISO8859-1/books/handbook/kernelconfig-building.htm! 


The final step is to enable mpd5 logging with syslog, and 
we can do this by creating a new empty file and chang- 
ing its ownership only for root user 


# touch /var/log/ppp.log 
# chmod 600 /var/log/ppp.log 


Then we must add these lines in /etc/syslog.conf 


'mpd 
Riek /var/log/ppp.log 


After we saved the file, finally we must restart syslog 
# /etc/rc.d/syslogd reload 


Note 
We should check that all needed modules are loaded be- 
fore putting the server online (Listing 4). 


More Info About netgraph Subsystem 

Many user-space applications use the Netgraph facility as 
well as mpd5, and here is a list of netgraph kernel configu- 
ration options for various device-independent node types 
(Listing 5). 


Conclusion 

Mpd5 is a great way to establish secure connections rea- 
sonably affordable with any compatible device (IPAD, An- 
droid and of course, laptops and PC), all in a simple and 
fast way ,really a “must to try”. 
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Partitioning (part 1) 


In the previous articles the main features of PostgreSQL, 
including server-side programming were shown. In 

this article a simple application scenario will be used to 
demonstrate the capability of partitioning huge amounts of 
data into different tables in different spaces transparently. 


What you will learn... 

- Basic concepts of data partitioning 

+ how to implement a partition using PostgreSQL features 
* table inheritance 


der to be as realistic as possible, make use of a 

few hundred megabytes of data and will require 
a bit of time to execut and will consume part of your disk 
bandwidth. All the examples have been tested on a Post- 
greSQL 9.1 cluster running on a FreeBSD 8.2-RELEASE 
machine. 


fo lease note that the examples shown here, in or- 


Data Partitioning 

Data partitioning is a technique that aims at providing bet- 
ter performances and modularity, splitting the data set in- 
to groups with similar features. Partitioning makes sense 
when the amount of data becomes big and/or complex 
and there is a either a need for performance improvement 
or a storage requirement. To put it simply, splitting the da- 
ta set into smaller groups allows for both faster archiving 
of each group and faster data scanning (i.e., queries on 
the data). 

The partitioning is done in an horizontal way, meaning 
that data is not split into joined tables, but into tables that 
group data depending on some of their attribute values. 
Such tables will therefore have the same set of attributes, 
but different values which will determine the table that will 
handle a set of tuples. Since each table has a “fixed” set 
of values, indexes and queries generally will not need to 
filter such attributes, resulting in a smaller and faster data 
search path. 
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What you should know... 

+ basic shell commands 

+ basic PostgreSQL concepts 

+ server-side programming concepts as explained in the previous pa- 
pers 


Table Inheritance 

PostgreSQL supports table inheritance. That means a ta- 
ble can inherit (i.e., extend) a parent table. The concept 
is pretty similar to the OOP inheritance: having defined 
a master (or parent) table, it is possible to define a table 
that has the same properties (i.e., columns) and provides 
extensions (e.g., other columns, constraints, and so on) 
to the parent set. When applying table inheritance it is im- 
portant to remember a few concepts: 


* «nsert Statements are routed automatically to the par- 
ent table and not to the children; 

¢ other SQL statements (Such as setecr, uppATE, DELETE) 
automatically provide a union of the parent table and 
all its children, unless told to do not do so; 

e there is no automatic way to apply an unique con- 
straint (like a key) over a parent-child table: each ta- 
ble is considered as a different entity and therefore 
with different constraints; 

¢ foreign keys and unique constraints are not inherited 
and have to be set-up manually. 


Table inheritance is a key feature for data partitioning, 
since it guarantees that all partitions (i.e., child tables 
that inherit from a common ancestor) have the same da- 
ta structure, allowing the DBA to define the conditions 
that make the each partitions different from the others. 
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Application Scenario 

Partitioning is worth applying to a large database with da- 
tasets that can be split depending on a few well defined 
criteria. As an example, a database for a simple Web fo- 
rum will be implemented. Since Web forums are common- 
ly used tools with a very high average of messages per 
user; they can be considered a good test case for a parti- 
tioned scenario. As usual, the example presented here is 
kept as simple as possible for didactic purposes. 

The first step is to create a new database, named 
forumdb, and its Superuser forum that will handle all the fo- 
rum related data. In order to do that, connect as pgsqi to 
the template1 database and issue the following commands: 


templatel=# CREATE USER forum WITH LOGIN ENCRYPTED 
PASSWORD ‘forum’; 

CREATE ROLE 

templatel=# CREATE DATABASE forumdb WITH OWNER forum; 


CREATE DATABASE 


It is now possible to connect to the new database and 
to define the structure and the tables as shown in List- 
ing 3. For the purposes of this example it suffices to 
have an author table that will contain information about 
each single forum user, a category table that will con- 
tain a description of available forum categories (e.g., 
hardware, general, networking and so on) and a table 
to contain each message (thread). The thread table is 
the core of the forum database and contains a mes- 
sage title and content, as well as foreign keys that ref- 
erence the message’s the author and its category. To 
keep the example simple each message is identified by 
a “thread identifier” and a “message identifier”, columns 
tid and mid, respectively. All messages with the same 
tid belong to the same thread and are ordered by the 
mid value, with the starting thread having a mid value 
of zero. 

The view vw_thread provides an easy way to get a list 
of all threads with information about authors, categories 


Listing 1. Output of vw_thread view 


Listing 2. Size of the thread table 


FROM thread t JOIN category c ON t.category pk = c.pk 
GROUP BY c.description 
ORDER BY c.description; 


description | count | min | max 
------------- $---------4------------4------------ 
General | Delos || L900: || AOA Oey 
Hardware | SOLA) || Weil iak |] 2(Oeveoe)—10)11 
Kernel | 848802 | 1993-01-05 | 2012-04-17 
Networking | 1493115 | 1992-01-05 | 2012-04-17 


forumdb=> SELECT * FROM vw_thread WHERE thread_id = 1; 

thread_id category main title | started by message replied by 

ee eee aN ee ee 
Hardware Thread start 1 | Luca Ferrari @ 1991-01-11 08:02:00 | How do you do this? | 
Hardware think so and so. A. Ferrari @ 1991-01- 08:04:00 
Hardware think so and so. Claudio @ 1991-01- 08:06:00 
Hardware think so and so. Ritehie @ 1991-01- 08:08:00 
Hardware think so and so. A. Ferrari @ 1991-01- 08:10:00 
Hardware think so and so. Claudio @ 1991-01- 08:12:00 
Hardware think so and so. eubicfolelaies (Cl Alay 08:14:00 
Hardware think so and so. AY Ferrari @ 19 91—Oi= 08:16:00 
Hardware think so and so. Claudiond 9 91—0i= 08:18:00 
Hardware think so an 


forumdb=> SELECT c.description, COUNT(t.mid), MIN(t.published_on), MAX(t.published_on) 
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Listing 3. The definition of the database for the forum application 


scenario 


DROP TABLE IF EXISTS author; 
CREATE TABLE author ( 
pk SERIAL NOT NULL, 
username text NOT NULL, 


display name text, 


PRIMARY KEY (pk), 


UNIQUE ( username ) 


INSERT INTO author( username, disp 


ay name) 


VALUES ( ‘flucal978’, ‘Luca Ferrari’ ); 
INSERT INTO author( username, display name) 
VALUES( ‘ringhiobd’, ‘A. Ferrari’ ); 

INSERT INTO author( username, display name) 
VALUES ( ‘winter’, ‘Claudio’ ); 

INSERT INTO author( username, display name) 
VALUES( ‘root’, ‘Ritchie’ ); 


DROP TABLE IF EXISTS category; 
CREATE TABLE category ( 

NOT NULL, 

id varchar NOT NULL, 


pk seria 


description text, 


since date default 


PRIMARY KEY (pk) , 
UNIQUE (id) 


NSERT INTO category(id, description, 


‘now’ ::text::date, 


since) 


CREATE 


CREATE 
AS 
SELECT 


TABLE thread ( 
NOT NULL, 


tid integer NOT NULL, 


pk serial 


mid integer NOT NULL, 
title text, 


content text, 


published on date default ‘now’::text::date, 
published at time default ‘now’::text::time, 
category pk integer, 

author pk integer, 
PRIMARY KEY (pk), 
UNIQUE (tid, mid), 


FOREIGN KEY (category pk) REFERENCES category (pk) , 


FOREIGN KEY (author pk) REFERENCES author (pk) 


OR REPLACE VIEW vw_thread 


tid AS thread _id, 
c.description as category, 
CASE WHEN t.mid = 0 THEN title 
HLSHe 
END as main title, 
CASE WHEN t.mid = 0 THEN a.display name || * 
@ * || t.published_on || ‘ * || 
t.published_at 
BLUSH 
END as started by, 


t.content as message, 


CASE WHEN t.mid <> 0 THEN a.display name || * 
Ce publetsiedkon ||| as sli 
t.published_at 


iiss, 


VALUES ( 
NSERT INT 

VALUES ( ‘net’, 
NSERT INT 

VALUES ( 
NSERT INT 

VALUES ( ‘dev’, 
NSERT INT 

VA. 


LUES(‘hw’, ‘Hardware’, 


‘mise’ , 


‘kern’, 


‘General’, ‘01/01/1990’ ::text::date) ; 


[O category(id, description, since) 
‘Networking’, ‘01/01/1992’ ::text::date) ; 
[O category(id, description, since) 

‘Kernel’ ,’01/01/1993’ :: text: :date) ; 
[O category(id, description, since) 


‘Development’ ,’ 31/01/1993’ ::text::date) ; 


[O category(id, description, since) 


01/07/1991’ ::text::date) ; 


DROP TABLE IF EXISTS thread; 


END as replied by 
FROM ((thread t JOIN author a ON 


category c ON t.category pk = c.pk) 


OUMDINRE JEN fecicalil, ic quaauel 


author pk = a.pk) JOIN 
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Listing 4a. The populate_forum stored procedure 


CREATE OR REPLACE FUNCTION populate _forum() 


RETURNS 
AS 
SBODYS 
DECLARE 


BEGIN 


VOID 

category multiplier integer; 

thread per author integer; 

message per thread integer; 
available authors integer; 

current tid integer; 
current_mid integer; 
current_category category*rowt ype; 
current_author author%Srowtype; 
current _writer author%rowtype; 
current _published_at time; 
current_published_on date; 

-- a multiplier for the size of each category 
Category emullenoleiera sil; 


-- how many authors are there? 
SELECT count (pk) 
FROM author 


INTO available authors; 


-- append to the max tid if any is existing 
SELECT max (tid) 
FROM thread 


IF current tid IS NULL THEN 


current tid ;= 0; 
ELSE 

Ciblercyone, jwull G— Cibkereeyaho, jealfol se ily 
END IF; 


-- iterate over each category 
FOR current_category IN SELECT * 
FROM category 
ORDER BY id 
LOOP 
-- increment the category multiplier 
category multiplier := category_ 


multiplier + ly 


RAISE INFO ‘Category % (multiplier 


%)’, current_category.description, 


category multiplier; 


=— set the initial date and time’ for 
publishing this category 

CuErent published sone —scierenta 
category.since + 2; 

currents pubilliishedsaty:— time )082 022000; 

RAISE INFO ‘category start date % at 
time %’, current _published_on, 


current published at; 


=- iterate on each author 

FOR current author IN SELECT * 
FROM author 
ORDER BY username 
LOOP 


-- how many message per author 
and per thread? 

thread per author := category_ 
multiplier * 300; 


WHILE thread per author > 0 AND 
current_published_on < current _date LOOP 
--RAISE INFO ‘ % threads 
remaining’, thread per author; 
@ibierisiale jeaiel RE (eibbecsigie jeael ar ily 
emiaasine iil P= (0p 
current published on := current_ 


published on + interval ‘2 days’; 


INSERT INTO thread( tid, mid, 
title, content, category pk, author pk, 
published on, published at ) 

VALUES ( current_tid, 
current mids “threadiistart  \ (||| 
Current tid, “Hew do you do this?” , 
current_category.pk, current _author. 
pk, current _published_on, current_ 
published at ); 
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Listing 4b. The populate_forum stored procedure 


thread per author := 
thread per author - 1; 

message per thread := 
category multiplier * 20 * available 


authors; 


WHILE message per thread > 
Q LOOP 
-- now insert replies from 
other authors 


FOR current_writer IN 


SHEECE * 

FROM 
author 

WHERE 
pk <> current_author.pk 

ORDER 
BY display name 

LOOP 


current_mid := 
current mid + 1; 

message per thread := 
Messagcupommennea cle ial, 

current published at 
:= current published at + interval ‘2 
minutes’ ; 

INSERT INTO thread ( 
tid, mid, title, content, category_ 
pk, author pk, published _on, 
published at ) 

VALUES ( current_tid, 
currentimid,) \ihread) reply) )|)|| 
current tid, ‘I think so and so.’, 
current_category.pk, current_writer. 


pk, current published on, current_ 


published at ); 


END LOOP; 
END LOOP; -- end of the while 


for the thread messages 


END LOOP; -- end of the while per 
author 
END LOOP; -- end of the iteration over 
authors 


END; 
SBODYS 


END LOOP; -- end of the category iteration 


LANGUAGE plpgsql; 


Listing 5. A stored procedure that creates all the per-category tables 


CREATE OR REPLACE FUNCTION create category tables ( 


RETURNS 
AS 
SBODY$ 
DECLARE 


BEGIN 


END; 
SBODY$ 


integer 
current category category%srowtype; 
created tables integer; 


created_tables := 0; 

-- iterate over each category 

FOR current category IN SELECT * 
FROM category 
ORDER BY id 
LOOP 


-- a dynamic query for creating the table 

EXECUTE ‘CREATE TABLE IF NOT EXISTS 
threadiy’ ||| cumrent category id 

Dy PCHECKi(cabegoryap =) ||| 

CurbentEcaregory: pk ||| )y 

*’ PRIMARY KEY(pk), * 

‘ FOREIGN KEY (category pk) 

REFERENCES category(pk), * 

‘ FOREIGN KEY (author pk) 

REFERENCES author(pk), *‘ 

* UNIQUE (tid, mid) *‘ 

‘) INHERITS (thread) ;'; 


Cwasiciecl walls o= Ciseliviecl ialolles ap i): 
END LOOP; -- end of the category iteration 
RETURN created tables; 


LANGUAGE plpgsql; 
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and thread start/replies, so that for instance two different 
threads are shown as follows: Listing 1. 

Having defined the main structures, it is possible to 
populate the database with a set of threads and re- 
plies. In order to simulate a workload the special stored 
procedure populate_forum has been defined (see List- 
ing 4). The idea is that for each category each author 
will start a certain number of threads and all the other 
authors will post a specific number of replies to those 
threads; by convention each thread starts two days af- 
ter the previous one from the same author, and all re- 
plies come with a little different time within the same 
day; finally the population stops when either the num- 
ber of threads-per-author or the current date is reached 
on each category. While the time scenario is not a real 
one, it will be useful to stress the partitioning implemen- 
tation. 

Executing the stored procedure makes the thread table 
grow to more than 4 million tuples, scattered amongst cat- 
egories as follows: Listing 2. 


For a total number of 4043657 tuples in 49828 data 
pages for around 390 MB of disk space. It is now time to 
partition the data! 


Partitioning Using the Forum Category 

A first kind of partitioning could be based on the message 
category: since each thread belongs to one and only one 
category and each category can live on its own, this is a 
good first partition schema. In order to implement this par- 
tition it is required to: 


* create a table for each category; such table will han- 
dle all the threads that belong to such category. It is 
worth noting that these tables have the same data 
structure of the main thread table; 

¢ build appropriate constraints on each table to avoid 
the erroneous or malicious manipulation of data be- 
longing to another category (e.g., the “net” category 
table cannot accept threads or queries that refers to 
the “kern” category); 


Listing 6. A stored procedure that migrates all the data into the 
appropriate table 


CREATE OR REPLACE FUNCTION migrate threads ( 
RETURNS VOID 
AS 
SBODY$ 
DECLARE 
current category category%Srowtype; 


BEGIN 


-- iterate over each category 

FOR current _category IN SELECT * 
FROM category 
ORDER BY id 
LOOP 


RAISE INFO ‘Inserting for category 3%’ 
current_category.description; 

-- copy each tuple in the right table 

EXECUTE ‘INSERT INTO thread_’ | 
current _category.id 

LI SS SELECT *SHROM thread) 

lI 

current _category.pk; 


WHERE category pk = ‘ || 


-- delete tuples from the master table 
RAISE INFO ‘Deleting for category 3%’, 


current_category.description; 
EXECUTE ‘DELETE FROM ONLY thread *‘ 
[|S SWHERE category spk — )™)|)|| 
current _category.pk; 
END LOOP; -- end of the category iteration 
END; 


SBODYS$ 
LANGUAGE plpgsql; 


Listing 7. A rule to handle inserting into the net category 


CREATE OR REPLACE r thread_insert_net AS 
ON INSERT TO thread 
WHERE new.category pk = 2 
DO INSTEAD 
INSERT INTO thread _net (pk, tid, mid, title, content, 
published _on, published at, 
category pk, 
author pk) 
SELECT new.pk, new.tid, new.mid, new.title, new. 
content, 
new.published_on, new. 
published at, new.category pk, 


new.author_pk 
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Listing 8. A stored procedure to automate the creation of all rules 
associated to the thread table 


CREATE OR REPLACE FUNCTION create category rules ( 


RETURNS integer 


ID NG) Ole = PUINESINISMII! IUISISN 


SELECT id 
FROM category 
INTO current category id 


AS WHERE pk = NEW.category pk; 
SBODYS ENDS EE, 
DECLARE 
current category category%rowtype; -- build the table name for the right insert 
created_rules integer; SELECT ‘thread ’ || current_category_id 
BEGIN INTO current category table name; 
Caeeineel wiles. F= (0) 
-- iterate over each category RAISE LOG ‘The query is going to be re-routed to 
FOR current category IN SELECT * the table %’, 
FROM category current _category table name; 
ORDER BY id 
LOOP -- execute the insertion into the right table 
ih GEOR — SS INSERT STEN 
-- rule for INSERT 
EXECUTE ‘CREATE OR REPLACE RULE r_ thread_ EXECUTE ‘INSERT INTO * 
insert _’ || current_category.id || current_category table name 
‘ AS ON INSERT TO thread * |] ~ Siete Gils 
‘ WHERE NEW.category pk = * USING NEW; 
|| current_category.pk END IF; 
’ DO INSTEAD: * 
‘ INSERT INTO thread’ || RETURN NULL; 
current _category.id END; 
S SHUG TAINEWE* "7 SBODY$ 
LANGUAGE plpgsql; 
Creabedmnrulesi —Nereacederulless ra, 
Listing 10. A stored procedure to create all the re-routing triggers 
END LOOP; -- end of the category iteration 
CREATE OR REPLACE FUNCTION create category triggers ( 
RETURN created_rules; RETURNS void 
END; AS 
SBODYS SBODYS 
LANGUAGE plpgsql; DECLARE 
current category categorySrowtype; 
Listing 9. A trigger function that re-routes tuple insertion BEGIN 
EXECUTE ‘DROP TRIGGER IF EXISTS tr_thread_ 
CREATE OR REPLACE FUNCTION thread_partitioning handler () trigger ON thread’; 
RETURNS trigger 
AS || \ BEFORE INSERT * 
SBODY$ ||“ ON thread * 
DECLARE || \ FOR EACH ROW * 
current_category id text; ||‘ EXECUTE PROCEDURE thread_ 
current_category table name text; partitioning handler ()’'; 
BEGIN END; 
RAISE LOG ‘Trigger executing as %’, TG OP; SBODYS 
-- get the name of the category LANGUAGE plpgsql; 
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* migrate all existing data from the thread table to the 
right per-category table; 

*« avoid the direct manipulation of the thread table, 
since all the queries must be “routed” to a specific 
per-category table; 

¢ allow transparent querying of the thread table and of 
its per-category children. 


Steps 1 and 2: 

Creating the Tables and the Constraints 

The first two steps can be done using a stored procedure, 
so that if the number of categories changes over time new 
partition tables can be created in an automated way. The 
idea is simple: for each category in the category table a 
new table named thread_categoryldentifier (e.g., thread_ 
kern) will be created. Such table will have the same data 
structure of the main thread table and will have a check 
constraint that assures that the category_pk column re- 
fers to only one category; moreover all foreign keys and 


unique constraints have to be redefined. Therefore this 
results in a manual statement like the following: 


CREATE TABLE IF NOT EXISTS thread_kern ( 
CHECK (category pk = 2), 
PRIMARY KEY (pk), 
FOREIGN KEY (category pk) REFERENCES category (pk), 
FOREIGN KEY (author pk) REFERENCES author (pk), 
UNIQUE (tid, mid) 

) INHERITS (thread) ;’; 


As stated before, using a stored procedure the whole 
task of creating each individual category table can be 
automated, and therefore the procedure shown in Listing 
5 can be used to create all the tables. 


Step 3: Migrating Existing Data to the Right Table 
Having all the per-category tables in place it is pos- 
sible to migrate each tuple from the main thread table 


Listing 11. The first messages on the thread table before migrating data 


forumdb=> SELECT * FROM thread WHERE tid = 1; 


joe || telol || iwetel || title | content | published_on | published at | category pk | author pk 

----- Se 
i | i || OM | ihinead ssitarta ls | PHowscdomyouncom thats |p oon (ie | 08:02:00 | by || il 
A || i | I thread erepliv ale ie thunke somandieson ues pee oO Oli | 08:04:00 | By || 2 
3 || it || 2.1 threadimepiliy iP i chankssomandisonss | 2991 Osi: | 08:06:00 | by | 3 
4 | it || 3 | Thread reply 1 | I think so and so. | 1991-01-11 | 08:08:00 | || 4 

Listing 12. The first messages are now on the hardware category table 

forumdb=> SELECT * FROM thread hw WHERE tid = 1; 

pk | tid | mid | title | content | published_on | published_at | category pk | author pk 

----- Se ee 
il | il | 0 | Thread start 1 | How do you do this? | 1991-01-11 | 08:02:00 | by || il 
| it] I |Peihneadsreply sy i thankesomandmsonss | Loon — (iia | 08:04:00 | br | Z 
3 || i || 2 | Thread reply 1 | I think so and so. | 1991-01-11 | 08:06:00 | S || 3 
4 | hy 3 | Thread reply 1 || ET think sovand so. | 1997-01-11 | 08:08:00 | Biel 4 


forumdb=> SELECT * FROM ONLY thread WHERE tid = 1; 


pk | tid | mid | title | content | published on | published at | category pk | author pk 


----4----- $o---- foonnn-- toonn----- $oonnn---------- foonnn----- 


(0 rows) 


soecdhessssecsssess dboscscscsess 
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Listing 13. The stored procedure that creates all the tables for 
categories and years within a category 


CREATE OR REPLACE FUNCTION create category tables ( 
RETURNS integer 


SELECT EXTRACT( year FROM MAX( published on ) ) 
INTO current _max_year 
FROM thread 
WHERE category pk = current_category.pk; 


AS 
SBODYS RAISE LOG ‘Generating time tables from 
DECLARE WeEle a ice) Weche “a! > Wbleideime Scie 
current category category%Srowtype; current max year; 
created tables integer; 
current_table name text; WHILE current_year <= current_max_year LOOP 
current_year integer; RAISE LOG ‘Creating sub-table for 
Vedrio. CuULrmenibm year, 
current_year to check integer; current year to check := EXTRACT ( 
current max year integer; year FROM current _category.since ) + 
BEGIN current year - 1; 
Created tables i= 0; 
-- iterate over each category EXECUTE ‘CREATE TABLE IF NOT EXISTS thread ’ 
FOR current category IN SELECT * current _category.id || ‘_ 
FROM category year’ || current_year 
ORDER BY id Nia 
LOOP : CHECK * 
‘ EXTRACT (year FROM 
published on) = * 
-- build a dynamic query for creating the current _year 
table Mate 
--EXECUTE ‘DROP TABLE thread _’ || * PRIMARY KEY(pk), *‘ 
current_category.id; ‘ FOREIGN KEY (category pk) 
EXECUTE ‘CREATE TABLE IF NOT EXISTS REFERENCES category(pk), * 
thread_’ || current_category.id 
STI(MCHECK (categony spk V51|)| * UNTOUR (tid, mid) * 
CUIASMe CAIEeeOAyaol< ||| “YY, ~ N)) TENPSDRFE MENS) ((Y 
* PRIMARY KEY(pk), * || ‘thread_’ || current _category.id 
\ FOREIGN KEY (category pk) ae 
REFERENCES category(pk), *‘ 
‘ FOREIGN KEY (author pk) 
REFERENCES author(pk), *‘ 
VSUNTOUR (tad, amid) CUleeINE Weeue FS Cbleideinic yyetene ap ile 
‘) INHERITS (thread) ;’; 
END LOOP; -- end of the per-year-while 
erearedmvabiles j—serecarcds cables srl, 
END LOOP; -- end of the category iteration 
-- compute the current year 
current_year := EXTRACT(year FROM RETURN created_tables; 
cumment Category: since)! 
-- get the max year for the current END; 
category, so that SBODYS 
-- no more tables than the max year will LANGUAGE plpgsql; 
be created 
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Listing 14. The stored procedure to migrate existing partitions and 
split them depending on the time 


CREATE OR REPLACE FUNCTION migrate threads by category_ 


and time () 

RETURNS VOID 

AS 

SBODY$ 

DECLARE 
current_category category%rowtype; 
current_year integer; 
current_year to check integer; 
current_max year integer; 

BEGIN 


-- iterate over each category 

FOR current category IN SELECT * 
FROM category 
ORDER BY id 
LOOP 


current year := EXTRACT(year FROM current_ 


category.since) ; 


--find the max year for this category 

SELECT EXTRACT( year FROM MAX( published on 
N ) 

INTO current_max_year 

FROM thread 

WHERE category pk = current _category.pk; 


WHILE current_year <= current_max_ year LOOP 


current_category. 
description, current_year; 
-- copy each tuple in the right table 
EXECUTE ‘INSERT INTO thread’ || 
current _category.id 
‘year’ || current_year 
‘ SELECT * FROM ONLY 
thread_’ || current_category.id 
‘ WHERE category pk = ‘ || 
current_category.pk 
‘ AND ( EXTRACT( year FROM 
publishedion™)\ 
\ 


Se lmeuenentayedn 


Yyr; 


-- delete tuples from the master table 
RAISE INFO ‘Deleting for category 3%, 
year %', 
Current Category. 
description, current_year; 
EXECUTE ‘DELETE FROM ONLY thread’ || 
current _category.id 
|| * WHERE category pk = * || 
current _category.pk 
|| * AND ( EXTRACT( year FROM 
published_on ) * 


[| > = [ll] Ciemmesiaie Syeeue 


CUREENIEE Carns CUrGenibE years tisla, 
END LOOP; -- end of the per-year loop 
END LOOP; -- end of the category iteration 


END; 


SB 


ODY$ 


LANGUAGE plpgsql; 


Listing 15. An extract of the result of data partitioning based on 
categories and times. 


forumdb=> SELECT relname, reltuples FROM pg class WHERE 


relname like ‘thread%’ AND relkind = 


‘r’ ORDER BY relname; 


relname reltuples 
Seseseeasesebese sense jebsesssease 
hread 0 
hread_hw 0 
hread_hw_year1991 29014 
hread_hw_year1992 29829 
hread_hw_year1993 29666 
hread_hw_year2004 5053 
hread_kern 0 
hread_kern_year1993 43621 
hread_kern_year1994 43862 
thread_kern_year1995 44103 
thread_kern_year1996 44103 
hread_kern_year1997 43862 
hread_kern_year2012 dS 255 
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to the per-category table to which it belongs: each tu- 
ple must be first inserted in the right table and then de- 
leted from the main table. For instance, to migrate the 
“net” category tuples a transaction should issue the fol- 
lowing: 


INSERT INTO thread_net SELECT * FROM thread WHERE 
category pk = 2; 
DELETE FROM ONLY thread WHERE category pk = 25 


And similarly, this should be done for all other catego- 
ries, as shown by the function migrate_threads in Listing 
6. Careful readers will have noticed a new keyword, ON- 
LY, in the veterz query above. Almost every PostgreSQL 
command is aware of the table inheritance and accepts 
an ONLY clause to specify that the command must be 
routed exactly to the table without having to consider the 
inheritance chain. Therefore, in the above commands, 
the insert is performed in a child table, while the ve.erz 
only on the parent table. 

Before the migration the situation on the thread table 
was the following: 


forumdb=> SELECT category, count (message) 
FROM vw_thread 
GROUP BY category; 

category | count 
See eee 7 eee ee 


Networking | 1493115 


Kernel | 848802 
Hardware | 391200 
General | 1310540 


while after the migration the situation is the same, with 
regard to the number of threads per category, but has 
a different layout (each category is now contained in a 
specific table and the thread table is empty): 


forumdb=> SELECT relname, reltuples::integer 
FROM pg_ class WHERE relname LIKE ‘thread%’ AND relkind = ‘r’; 


relname reltuples 
2o222e22Ste22 Peosskasoes 
thread (0) 
thread_hw 391200 
thread_kern 848802 
thread_misc 1310540 
thread_net 1493115 


Please note that for the above query to report the right 
result it might be required to run a vacuum FuLL aANALYzE. 
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Step 4: 

Avoiding the Manipulation of the Thread Table 

This step can be implemented either with rules or triggers 
on the thread main table: the idea is to intercept a state- 
ment routed to the thread table (e.g., 1nszrr) and re-route 
it to the appropriate table depending on the category in- 
formation within the statement itself. For instance, when a 
new message is posted within the net category (pk = 2), 
the tuple must be stored in the thread_net table and not in 
the thread table. 


Using Rules 

Rules allow the rewriting of queries so that the real target 
(i.e., the table against which the statement will be execut- 
ed) can be altered as required. For instance, for an tnserr 
statement to be routed to the “net” table (ok = 2) a rule like 
the following is required: Listing 7. 

Such a rule, “attached” to the thread table, will re-route 
an insert of the “net” category into the thread_net ta- 
ble. Again, in order to get the rule creation automated, 
a stored procedure is used (see Listing 8). It is worth re- 
minding readers that vepare, setecr and peLere Statements 
already include the children tables, and therefore there is 
no explicit need to define rules for such queries, even if 
this were possible. 

It is worth noting that the creation of rules, in particu- 
lar the delete ones, will avoid manipulation of the thread 
table. This is the reason why data was migrated in the 
previous step, otherwise it would have been impos- 
sible to remove scattered tuples from the thread table. 
By the way, it is always possible to disable and re-en- 
able the delete tuples in order to migrate data after this 
step. 


Using Triggers 

Since insert statements need re-routing to the right table, 
and this has to be applied before the tuple commits to the 
thread main table, it is possible to build a simple trigger 
function (see Listing 9) that can be associated with the 
thread table. 

The thread_partitioning_handler trigger function acts 
for “before” events and performs an insert into the right 
table. It is worth noting the usage of the special syntax 
EXECUTE...USING that allows for the usage of the NEW 
trigger tuple. Moreover, please note that the trigger al- 
ways returns NULL meaning that no tuple must hit the 
thread table as result of an inserr statement. The trig- 
ger function is attached to the thread table via a stored 
procedure (see Listing 10), so that each time the proce- 
dure Is called the triggers for each available category are 
generated. 
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RETURNS void 
AS 


AS 

SBODYS 

DECLARE 
curren 
CUurFen 
curren 
CurEFenN 

BEGIN 
RAISE 


Listing 16. The stored and trigger procedures to enable INSERT 
triggers on per-category tables 


CREATE OR REPLACE FUNCTION create category time_ 


triggers () 


SBODY$ 
DECLARE 
current category category%rowtype; 
BEGIN 
-- iterate over each category 
FOR current category IN SELECT * 
FROM category 
ORDER BY id 
LOOP 
EXECUTE ‘DROP TRIGGER IF EXISTS tr_thread_ 
trigger time ON * 
Wienncade a) ||(curncnimcategonye ld, 
EXECUTE ‘CREATE TRIGGER tr_thread_trigger_time ‘ 
‘’ BEFORE INSERT * 
‘ ON thread_’ || current_category.id 
‘ FOR EACH ROW * 
‘ EXECUTE PROCEDURE thread_ 
subpartitioning handler ()’; 
END LOOP; 
END; 
SBODY$ 


LANGUAGE plpgsql; 


CREATE OR REPLACE FUNCTION thread _subpartitioning_ 


handler () 


RETURNS trigger 


t_category id text; 
t_ category table name text; 
t_category since date; 
t_ category year integer; 


LOG ‘Trigger executing as %’, TG OP; 


-- get the name of the category 
Ie ING) OI = ‘INSERT’ THEN 


-- get the current category id and 
current year 
SELECT id, EXTRACT( year FROM NEW. 
published_on ) 
FROM 
INTO 


category 


current category id, current_ 


category year 


== build the table name for the right 
insert 
SELECE Vehneadey ||| curment category sid 
[|S syeeue 
[iMeUBEeniimeategonya 
year 


INTO current category table name; 


END IF; 


RAISE LOG ‘The query is going to be re-routed to 
the table %’, current category table_ 


name; 


-- execute the insertion into the right table 


IF TG OP = ‘INSERT’ THEN 
EXECUTE ‘INSERT INTO * 
|| current_category table name 
[| * Sine Silos S 
USING NEW; 
END IF; 


RAISE LOG ‘Trigger finished!’; 
RETURN NULL; 


END; 
SBODY$ 
LANGUAGE plpgsql; 
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Step 5: Provide Transparent Querying of the Thread 
Table 

Believe it or not, this is the easiest part since PostgreSQL 
gives you this for free! Each szzecr targeted to the thread 
table will be automatically perform an union of all the re- 
sults of the thread children: Listing 11. Which reports the 
same results as if the query were run against the single 
thread_hw table (category_pk = 5): Listing 12. 

This is because the szrecr statement automatically pro- 
vides a union of the table against which it is run (e.g., 
thread) and all its children. Using the ONLY keyword it 
is possible to tell PostgreSQL to avoid descending the 
inheritance chain, so that the following query now fails 
since the thread table is empty: Listing 12. Similar consid- 
erations are true for other non-insert statements, so there 
is no special need to build a specific set of rules/triggers. 


Time-based Partitioning 

It is possible to improve the partitioning just completed 
by adding a new partition based on the thread time (i.e., 
the published_on column). The idea is to refine each per- 
category table by setting up a table that will contain each 
year of messages. This means that per-year tables will 


Box 1. How to quickly (re)set the database 

In order to perform the partitioning examples described here 
it is possible to quickly drop and rebuild the whole database. 
The source code of the example is contained in the GitHub re- 
pository (see the references) in the bsdmag/05-partitioning 
folder, therefore, being connected to the forumdb it is possi- 
ble to issue the following queries: 


DROP TABLE IF EXISTS thread_net CASCADE; 
DROP TABLE IF EXISTS thread _ misc CASCADE; 
DROP TABLE IF EXISTS thread_kern CASCADE; 
DROP TABLE IF EXISTS thread_misc CASCADE; 
DROP TABLE IF EXISTS thread CASCADE; 

DROP TABLE IF EXISTS author; 

DROP TABLE IF EXISTS category; 

DROP VIEW IF EXISTS vw_thread; 

\i 01-forum-database-initial-setup.sgl 

\i 02-function-populate.sql 

SELECT populate forum(); 

\i 03-function-create-category-tables.sql 
SELECT create category tables(); 

\i 04-migration.sql 

SELECT migrate threads (); 

\i 05-thread-table-rules.sql 

SELECT create category rules(); 


And to enable the per-year subpartitioning: 


\i 07-partitioning-time.sql 

SELECT migrate threads by category and _time(); 
SELECT create category time triggers (); 

and last issue 


VACUUM FULL ANALYZE; 
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On The Web 

+ PostgreSQL official Web Site: http:/,www.postgresql.org 
ITPUG official Web Site: http:/Avww.itpug.org 
PostrgeSQL Table Inheritance Documentation: http:// 


www.postgresql.org/docs/current/static/ddl-inherit.html 
GitHub Repository containing the source code of the ex- 
amples: https://github.com/fluca1978/fluca-pg-utils 


be defined for each category, for instance thread_net_ 
year1992, thread_net_year1993 and so on up to the last 
year a post for the category exists; each post will be then 
be routed to the right year table. 

The partitioning steps are similar to those explained in the 
previous example, but now the first step is to create per-year 
tables, and therefore the stored procedure create_catego- 
ry_tables is changed as shown in Listing 13. After the tables 
are in place, data must be migrated from per-category tables 
to per-year tables, and this is done similarly to previous par- 
titioning using the migrate_threads_by_category_and_time 
stored procedure (see Listing 14), that produces a database 
populated as shown in Listing 15. As a last step, a trigger to 
handle inserr statements over each per-category table has 
to be set, and this is automated through the create_catego- 
ry_time_triggers stored procedure shown in Listing 16. 

In the final scenario the master table thread cascades 
queries to per-category tables (e.g., thread_net) which in 
turn cascade to per-year tables within the same category 
(e.g., thread_net_year2004). 

As readers can see, the above examples are quite sim- 
ple and partially simulate a real situation. Of course having 
the ability to pre-partition the database before having to 
deal with huge amounts of data allows a better database 
design, and nullifies the migration of data. For instance, 
having chosen from the beginning to partition messages 
by time instead of by category could have remove a layer 
of inheritance (the per-category tables) that could result in 
a more difficult to maintain database. 


Summary and Coming Next 

This article examined the data partitioning and the table in- 
heritance that can be exploited in PostgreSQL to achieve 
data splitting across multiple tables. In the next article the 
tablespace feature will be presented as wells as another 
data partitioning scenario. 
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Securing DNS 
Transactions 


In the June 2012 issue, we outlined the threats, security 
objectives, and protection approaches for various DNS 


transactions. 


What you will learn... 
« How to restrict transactions based on IP address 
+ How to configure TSIG for BIND and NSD 


menting those approaches, as well as operational 

best practices that go with those implementations. 
The DNS protocol protection approaches are elaborated 
here in more detail: 


T his article provides the steps involved in imple- 


Restricting Transaction Entities Based on IP Address 

In this type of implementation, the DNS name servers and 
clients participating in a DNS transaction are restricted 
to a trusted set of hosts by specifying their IP address- 
es in appropriate access control statements provided by 
the name server software. The protection provided by 
these IP-based access control statements can be circum- 
vented by attacks such as IP spoofing. Hence, this solu- 
tion is not recommended for DNS query/response, zone 
transfer, and dynamic update transactions that have high 
threat impact. However, for the DNS NOTIFY transaction, 
where the only threat is spurious notification (which may 
not even trigger a zone transfer), an access control based 
on IP address will suffice. Although this solution is not rec- 
ommended generally, a description of the mechanics of 
access control using IP addresses is provided in “Restrict- 
ing Transaction Entities Based on IP Address” because 
the same statements are used to identify hosts based on 
named keys while implementing transaction protection 
using hash-based message authentication codes. This 
approach has been implemented for all DNS transactions. 


BSD 


MAGAZINE 


x4 


What you should know... 


+ An understanding of how DNS works 


Transaction Protection through Hash-Based 
Message Authentication Codes (TSIG Specification) 
In this approach, transaction protection is enabled 
through generation and verification of hash-based mes- 
sage authentication codes (HMAC). Because these codes 
are embedded within a special RR of RRType TSIG, the 
specifications that outline protection of DNS transactions 
using HMAC are called TSIG in the DNS community. 
TSIG specifications are described in RFC 2845 and 3007. 
Application of TSIG specifications for protection of zone 
transfer and dynamic update transactions is described in 
Transaction Protection Through Hash-Based Message 
Authentication Codes (TSIG). 


Transaction Protection through Asymmetric Digital 
Signatures (DNSSEC Specification) 
This approach, which goes by the name DNS security ex- 
tensions (DNSSEC), is described through a family of RF- 
Cs 4043, 4044, and 4045. The core services provided by 
DNSSEC are data origin authentication and integrity pro- 
tection. DNSSEC is used mainly for securing DNS infor- 
mation obtained from DNS query/response transactions. 
The deployment issues of DNSSEC are described in Part 
2D: 

Before we dive into the article, | want to take a moment 
to talk about NSD and Unbound vs. BIND. Most of you 
know that BIND is the de facto standard DNS server. It’s 
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a free software product and is distributed with most UNIX 
and Linux platforms. 

NSD is an open-source authoritative server developed 
by NLNet Labs of Amsterdam in cooperation with the 
RIPE NCC. NSD is a test-bed server for DNSSEC, and 
new DNSSEC protocol features are often prototyped us- 
ing the NSD code base. The original intention of NSD was 
to develop an authoritative server implementation inde- 
pendent of BIND that could be used on root servers, thus 
making the root zone more robust through software di- 
versity. Three root servers and several top-level domains 
now use NSD, but you don’t have to be a root server or 
TLD to benefit from NSD’s robustness, speed, and sim- 
plicity. 

Unbound is a recursive DNS server that is complemen- 
tary to NSD. It was developed in C by NLNet Labs from 
a Java implementation by Verisign, Nominet, Kirei, and 
EP.NET. Together, NSD and Unbound provide flexible, 
fast, secure DNS service appropriate for most sites. The 
NLNet Labs components are not as mature as BIND and 
do not have as many bells and whistles, but they are fine 
solutions for most sites. 

The DNSSEC code in NSD and Unbound is more robust 
and better tested than that in BIND. It’s also faster. For ex- 
ample, Unbound is about five times faster than BIND at 
verifying DNSSEC signatures. BIND still has an edge in 
some areas, though, notably in documentation and in ex- 
tra features. For a really robust DNS regime, run both! 


Restricting Transaction Entities Based on IP 
Address 
Some DNS name server implementations, such as BIND 
9 and higher, provide access control statements through 
which it is possible to specify hosts that can participate in 
a given DNS transaction. The hosts can be identified by 
their IP address or IP subnet reference (called /P prefix) in 
these statements. 

The list containing these IP addresses and/or IP pre- 
fixes is called an address match list. (An address match 
list can be made up of other things besides IP address- 


Table 1. BIND Access Control Statement Syntax for DNS Transactions 


Securing DNS Transactions 


es and IP prefixes, as described in “Restricting DNS 
Query/Response Transaction Entities”). The address 
match list is used as an argument in various access 
control statements that are available for use in BIND 
configuration files. There are separate access control 
statements for each type of DNS transaction. The syn- 
tax of the various access control statements and the 
DNS transaction for which each is used are given in 
Table 1. 

The purpose of each of these access control statements 
is as follows: 


e allow-query: specifies the list of hosts allowed to que- 
ry the name server as a whole or a particular zone 
within the name server 

e allow-recursion: specifies the list of hosts allowed 
to submit recursive queries to the name server as 
a whole or to a particular zone served by the name 
server 

e allow-transfer: specifies the list of hosts allowed to 
initiate zone transfer requests to the name server as 
a whole or to a particular zone within the name serv- 
er. This statement is predominantly required for con- 
figuration of master name servers. 

e allow-update: specifies the list of hosts allowed to ini- 
tiate dynamic update requests 

¢ allow-update-forwarding: specifies the list of hosts al- 
lowed to forward dynamic update requests (regard- 
less of the originator of the requests) 

e allow-notify: specifies the list of hosts from which to 
accept DNS NOTIFY messages indicating changes 
in the zone file. This list is relevant only for configura- 
tion of secondary slave name servers. 

e blackhole: specifies the list of hosts that are black- 
listed (barred) from initiating any transaction with this 
name server. Used only in an options server-wide 
ACL statement. 


The foregoing access control statements are, in fact, 
substatements that can be used in the context of options 


Access Control Statement Syntax DNS Transaction 


allow-recursion { address_match_list } Recursive Query 


allow-update { address_match_list } Dynamic Update 


allow-notify { address_match_list } DNS Notify 
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and zone statements in the BIND 9 and higher configu- 
ration file (with the exception of blackhole). When they 
are used within the zone statement, they specify access 
control restrictions for the corresponding DNS transac- 
tion for that specific zone. When they are used as part 
of the options statement, they specify access control re- 
strictions for the corresponding DNS transaction for the 
name server as a whole (because a name server could 
host multiple zones). 

NSD has a similar set of configuration options for cer- 
tain transactions. NSD has a more limited set of options 
and currently can only restrict zone transfers. In the fol- 
lowing sections, if a comparable set of options exists for 
NSD, they will be listed. Otherwise, it should be noted 
that a comparable option does not exist at the time of 
writing. 


Restricting DNS Query/Response Transaction 
Entities 

An example of the usage of the allow-query substate- 
ment (to specify restrictions for the DNS query/response 
transaction stating the IP addresses/subnets from which 
DNS queries are accepted) both at the server- and at 
the zone level (for the zone example.com) is given 
below: 


options { 
allow-query { 254.10.20.10; 239.10.30.29/25; }; 


zone “example.com.” { 

type master; 

file “zonedb.example.com”; 

allow-query { 192.249.249.1; 192.249.249.4; }; 
}; 


Specifying the list of IP addresses and IP prefixes within 
the options and zone statements could clutter the config- 
uration file. Furthermore, the list of IP addresses and IP 
prefixes could be the same for many of the access con- 
trol statements within a name server, and errors could 
be introduced if any additions or subtractions are made 
for that list. To avoid these problems, BIND provides a 
means to create named address match lists, which are 
called access control lists (ACL). These ACLs can be 
used in place of the list of IP addresses/IP prefixes (in 
the address match list argument) in the access control 
statements. 

The ACLs are created by using the aci statement in 
BIND 9 and higher. The general syntax of the aci state- 
ment is as follows: 
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acl acl-list-name { 


address _match_list 


The is a user-defined string (e.g., 
internal hosts). Th@ address match list can be 
a list of IP addresses, IP address prefixes (denoting 
subnets), or cryptographic keys. An example of an acl 
statement that uses an IP address and a subnet refer- 
ence in address _ match _ list is given below. In the ex- 
ample, 254.10.20.10 denotes the IP address of a host, 
and the IP prefix 239.10.30.0/24 denotes a class C sub- 
net. 


acl-list-name 


acl “internal hosts” { 
254.10.20.10; 
239.10.30.0/24; 

hi 


The use of ACL internal _ hosts in place of the list of 
IP addresses/IP prefix in the options and zone statement 
given above is as follows: 


options { 


allow-query { internal hosts; }; 


zone “example.com.” { 
type master; 
file “zonedb.example.com”; 
allow-query { internal hosts; }; 
}; 


The address match list parameter in an access control 
statement can contain any of the following values: 


¢ An IP address or list of IP addresses 
¢ An IP prefix or list of IP prefixes 

« ACLs 

« Acombination of the above three. 


The definition of ACLs forms a critical element in the 
configuration of DNS transaction restrictions. Hence, it 
is a good operational practice for the DNS administra- 
tor to define and create ACLs pertaining to different DNS 
transactions. 


Checklist #7 

It is recommended that the administrator create a named 
list of trusted hosts (or blacklisted hosts) for each of the 
different types of DNS transactions. In general, the role of 
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the following categories of hosts should be considered for 
inclusion in the appropriate ACL: (1) DMZ hosts defined 
in any of the zones in the enterprise; (2) All secondary 
name servers allowed to initiate zone transfers; (3) Inter- 
nal hosts allowed to perform recursive queries. 

In addition to IP address, IP prefix, or ACL, the address 
match list parameter in the access control statements can 
take on any of the following special values: 


* none: matches no hosts 

* any: matches all hosts 

¢ localhost: matches all IP addresses of the server on 
which the name server is running 

¢ localnets: matches all IP addresses and subnet 
masks of the server on which the name server is run- 
ning. 


Following are a few more examples of commands for 
creating ACLs and the use of ACLs within options and 
zone Statements: 


acl “local_hosts” { 
254.10.20.10; 
239.10.30.29/25; 


acl “fake-net” { 
0.0.0.0/8; 
1.0.0.0/8; 


options { 
allow-query { any; }; 
blackhole { fake-net; }; 


zone “example.com.” { 
type master; 
file “zonedb.example.com”; 
allow-query { local hosts; }; 
}; 


In the named.conf snippet above, two ACLs, iocal _ 
hosts and fake-net, have been specified. DNS queries 
from any hosts are allowed at the server level. No trans- 
actions are permitted from the hosts included under 
fake-net. Queries to the zone example.com can be ini- 
tiated only by the hosts included under the ACL 1tocal _ 
hosts because any restriction specified under the zone 
(zone-specific) statement overrides the restriction spec- 
ified under the options (Server-wide) statement. 
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Key material can also be used in ACL statements. This 
would indicate that only hosts knowing (and using) the 
shared key (or key pair) would be able to communicate. 
How a key is used in an ACL is discussed in “Defining the 
Keys in the Communicating Name Servers.” 

NSD does not have a feature to define ACLs as a means 
of only allowing queries from designated sets of hosts. 


Restricting Recursive Queries 

Authoritative name servers provide name resolution ser- 
vice from their own data and are supposed to provide 
this service for any DNS client. Hence, configuring an 
authoritative name server to accept queries from a re- 
stricted set of hosts does not make sense. The practical 
security protection for an authoritative name server is to 
turn off the query recursion feature, so that the authorita- 
tive name server does not poison its cache by querying 
other (possibly compromised) name servers. A local re- 
solving/recursive name server can be configured to ac- 
cept queries only from internal hosts, to protect it from 
denial-of-service attacks as well as cache poisoning. 
However, there may be situations in which it is economi- 
cally infeasible to dedicate separate servers for authori- 
tative service and resolution service, and the resolving 
name server has to perform as authoritative server for 
one or more zones. In this situation, the following strat- 
egies are possible within the BIND 9 and higher name 
server: 


¢ Restricting all queries accepted by the server to a 
specified set of IP addresses of internal clients and 
then overriding this set only for authoritative zones 
so that any DNS client can obtain information for re- 
sources in that zone. 

¢ Restricting recursive queries to a specified set of IP 
addresses of internal clients through a direct configu- 
ration option 

¢ Serving different responses (data) to different clients 
by defining views. 


Restriction at server level with override for authoritative 
zones 

In this strategy, the allowable set of internal clients who 
can submit queries to the name server is specified through 
the acl statement as follows: 


acl internal_hosts {192.158.43.3; 192.158.43.6; 
192.158.44.56;}; 


The server-wide option would be to restrict all queries to 
the list of clients: 
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options { 
allow-query { internal hosts; }; 
- or - 
allow-recursion { internal_hosts; }; 


}; 


The option can be overridden by specifying zones for 
which this name server is authoritative (thus allowing 
queries to that zone from all clients): 


zone “example.com” { 
type master; 
file “zonedb.example.com”; 
allow-query { any; }; 

}; 


Restricting all recursive queries to a specified set of IP 
addresses 
Server-wide restriction: 


options { 
allow-recursion { internal_ hosts; }; 


}; 


Restricting recursion through views 

The purpose of creating views is to create a logical par- 
tition made up of a combination of clients (based on 
IP addresses) and zones for which recursive queries 
will be supported and those for which they will not be 
supported. 

In the following example, the view nc 
view is enabled to define the scope of IP address- 
es and zones that are permitted to submit recursive 
queries; no recursion view is meant for disallowing 
recursion. 


recursion 


view recursion view { 
match-clients { internal hosts; }; 
recursion yes; 


}; 


view no_recursion view { 
match-clients { any; }; 
recursion no; 


}F 


It should be noted that NSD (as of the time of writing) is 
an authoritative only DNS server. 

Therefore, an NSD server will never act as a recursive 
server and only serve authoritative information from the 
zones it is configured to serve. 
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Restricting 

Zone Transfer Transaction Entities 

Authoritative name _ servers’ (especially primary 
name servers) should be configured with an allow- 
transfer access control substatement  designat- 


ing the list of hosts from which zone transfer requests 
can be accepted. These restrictions address the de- 
nial-of-service threat and potential exploits from un- 
restricted dissemination of information about internal 
resources. 

Based on the need-to-know principle, the only name 
servers that need to refresh their zone files periodical- 
ly are the secondary name servers. Hence, zone trans- 
fers from primary name servers should be restricted to 
secondary name servers. The zone transfer should 
be completely disabled in the secondary name serv- 
ers. The address match list argument for the allow- 
transfer Substatement should consist of IP addresses 
of secondary name servers and stealth secondary name 
servers. 

The command to create an ACL valid secondary NS 
with the IP addresses of three secondary name servers 
is as follows: 


acl “valid_secondary NS” { 
224.10.229.5; 
224.10.235.6; 
239.10.245.25; 

he 


The allow-transfer Substatement can be used in a 
zone Statement and in an options statement. When it is 
used in a zone Statement, it can restrict zone transfer 
for that zone; when it is used in an options statement, 
it can restrict zone transfer for all zones in the name 
server. 

The allow-transfer Substatement at the server level is 
as follows: 


options { 
allow-transfer { “valid_secondary NS”; }; 
he 


The allow-transfer substatement at the zone level is as 
follows: 


zone “example.com” { 

type master; 

file “zonedb.example.com”; 

allow-transfer { “valid secondary NS”; }; 
}; 
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The foregoing statements apply to primary name serv- 
ers. In the secondary and stealth secondary name 
servers, zone transfer should be disabled as shown be- 
low: 


zone “example.com” { 
type slave; 
masters { 224.239.5.1; }; 
file “zonedb_bak.example.com”; 
allow-transfer { none; }; 
}; 


Restricting Zone Transfer in NSD 

NSD has a similar set of tools to restrict zone transfers 
to only a chosen set of slave servers. Like BIND, the ad- 
ministrator should learn and use the options available in 
the NSD configuration file. There is no way to create ac- 
cess control lists (ACLs), but an administrator can list the 
individual IP addresses of slave servers in the zone state- 
ments in the NSD configuration file. 

In the configuration file, the provide-xfr statement is 
used in the zone statement block of the nsd.coné file, 
much like a combined masters statement and allow- 
transfer Statement in BIND configuration files: 


zone: 
#fallow transfer from subnet 
provide-xfr: 169.192.85.0/24 
#prevent transfer from specific IP address in block 
provide-xfr: 169.192.85.66 BLOCKED 


Only one IP address should appear in a provide-xfr 
statement, but the address can be an entire subnet. The 
provide-xfr statement allows transfers; all other transfer 
requests are rejected by default. 


Restricting Dynamic Update Transaction 
Entities 

Dynamic updates on a zone file can be directed only to 
the copy of the zone file that resides on the primary name 
server for the zone (i.e., where the master zone file re- 
sides). By default, dynamic update is turned off in both 
BIND 8 and higher. Dynamic updates are enabled or re- 
stricted by using one of the following two statements in 
BIND: 


¢ allow-update 
* update-policy (available only in BIND 9 and higher). 


These statements can be specified only at the zone lev- 
el, not at the server level. Hence, these statements are 
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substatements within the zone statement. The allow- 
update substatement enables specification of dynamic 
update restrictions based on IP addresses and a shared 
secret (also called a TS/G key[1]). The use of the allow- 
update Statement using IP addresses alone is addressed 
in this section. The use of the allow-update statement 
using TSIG keys is described in “Securing Zone Trans- 
fers using TSIG.” 

The update-policy statement enables specification of 
dynamic update restrictions based on TSIG keys only, but 
it enables specification of update restrictions at a finer lev- 
el of granularity. The allow-update substatement implies 
update access rights to all records of a zone; the update- 
policy substatement can be used to restrict update ac- 
cess rights to one or more designated RRTypes (e.g., A 
RRs). 

To use the allow-update statement, an address match 
list must be created. The command to create an ACL pu_ 
Allowed List with one IP address is as follows: 


acl “DU_Allowed_List” { 
192.249.12.21; 


The ACL pu_ Allowed List (consisting of IP address- 
es of hosts allowed to send dynamic update requests for 
updating the contents of the zone example.com) is used 
within the allow-update Substatement of the zone state- 
ment as follows: 


zone “example.com” { 
type master; 
file “zonedb.example.com”; 
allow-update { “DU Allowed List”; }; 
}; 


Dynamic update requests generally originate from 
hosts such as DHCP servers that assign IP addresses 
dynamically to hosts. Once they assign an IP address 
to a new host, they need to store the FQDN-to-IP ad- 
dress mapping (by creating an A RR) and address-to- 
FQDN mapping (by creating a PTR RR) information in 
the primary authoritative name servers for the zones. 
Creation of this information occurs through dynamic up- 
dates. 

As of the time of writing, NSD does not support dynamic 
update so there are no comparable configuration options 
for NSD. All dynamic update messages sent to a DNS 
server running NSD will be rejected. Updates to a zone 
must be done offline and then then the server signaled to 
reload the new modified zone. 
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Restricting BIND DNS NOTIFY Transaction 
Entities 

Once zone transfers have been set up between servers, 
it is a good idea to make sure that secondary name serv- 
ers are informed about changes to zone file data through 
a notification message. By default, a notification message 
is sent whenever a primary name server detects a change 
in the zone file. It sends a DNS NOTIFY message to ev- 
ery name server listed in the NS RRSet in the zone, be- 
cause they are the recognized secondary name servers of 
the zone. DNS administrators should keep notification on, 
as this configuration will allow updates to be propagated 
quickly to secondary name servers. If the DNS adminis- 
trator wants to turn off the functionality for a specific zone, 
however, the notify substatement should be used in the 
zone Statement of that zone: 


zone “example.com” { 

type master; 

notify no; 

file “zonedb.example.com”; 
}; 


If there are any additional servers to which the zone ad- 
ministrator wants the DNS NOTIFY message to be sent 
(e.g., a stealth slave server), the also-notify substate- 
ment should be added to the zone statement, and the IP 
addresses of the additional servers should be specified 
as its parameter values, as shown below: 


zone “example.com” { 
type master; 
algo-notify {1 192.168.25.2% 3 
file “zonedb.example.com”; 

}; 


The receiver of the DNS NOTIFY message, the sec- 
ondary name server, allows notify messages only from 
the primary name server by default. (Recall that the 
secondary name server is made aware of its primary 
name server through the masters substatement in the 
zone Statement.) If the secondary name server wants 
to receive notify messages from additional servers, 
the allow-notify substatement in the zone statement 
must be added, and then the IP addresses of those 
servers must be specified in that substatement, as 
follows: 


zone “example.com” { 


type slave; 
allow-notify { 193.168.25.4; }; 
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file “zonebak.example.com”; 
masters { 192.168.25.1; }; 
}F 


Restricting NSD DNS NOTIFY Transaction 
Entities 

There are two statements (both placed in the zone: state- 
ment block) that a DNS administrator can use to send 
DNS NOTIFY messages or restrict listening for DNS NO- 
TIFY messages to a particular IP address (a master serv- 
er in the case of NSD acting as a slave server). 

To configure NSD to send DNS NOTIFY messages to a 
particular IP address (either a slave secondary or a stealth 
secondary) and a particular TSIG key or the option noxey if 
no TSIG is used, the following is added to the zone: state- 
ment block in the NSD configuration file: 


zone: 


notify: 10.0.0.10 NOKEY 


The configuration of a slave server, accepting notifica- 
tion messages only from specific IP addresses, would 
look like the following: 


zone: 


allow-notify: 10.11.12.13 NOKEY 


Transaction Protection through Hash-Based 
Message Authentication Codes (TSIG) 

The process of authenticating the source of a message 
and its integrity through hash-based message authenti- 
cation codes (HMAC) is specified through a set of DNS 
specifications known collectively as TSIG. The term 
HMAC is used to denote both the message authen- 
tication code generated by using a keyed hash func- 
tion and the hash function itself. HMAC is specified in 
RFC 2104 and generalized in the NIST document FIPS 
198-1. 

An HMAC function uses two parameters — a mes- 
sage input and a secret key — and produces an output 
called a message authentication code (MAC) or hash. 
The sender of the message uses the HMAC function 
to generate a MAC and sends this MAC along with the 
message to the receiver. The receiver, who shares the 
same secret key, uses the key and HMAC function used 
by the sender to compute the MAC on the received mes- 
sage. The receiver then compares the computed MAC 
with the received MAC; if the two values match, it pro- 
vides assurance that the message has been received 
correctly and that the sender belongs to the community 
of users sharing the same secret key. Thus, message 
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source authentication and integrity verification are per- 
formed in a single process. 

The hash algorithm, which forms the primitive for the 
hash function, generates a fixed-size MAC or hash from a 
message of arbitrary size. The HMAC function for TSIG is 
specified in RFC 2845 and was extended in RFC 4635 to 
support more hash algorithms (SHA-1 and SHA-2 family 
of algorithms). 

Transaction protection through HMAC, using a shared 
secret, is not a scalable solution. This is the reason the 
TSIG specification is largely used only for zone transfer 
and dynamic update transactions. These DNS transac- 
tions are either between servers in the same administra- 
tive domain, or between servers in domains with previ- 
ously established interactions interconnections. 

The MAC, or hash value generated by the sender of the 
DNS message, is placed in a new RR called a TSIG re- 
cord that is added to the DNS message. The TSIG record, 
in addition to the generated hash, contains the following: 


¢ Name of the hash algorithm used 

* Key name 

* Time the hash was generated (time stamp) 

¢ “Fudge factor” — time in seconds (usually 5 minutes) 
to use as delta on either side of the time generated 
for which the TSIG signature should be considered 
valid; used to account for possible clock skew be- 
tween hosts. 


The time stamp field specifies the time at which the 
MAC was generated. The purpose of this field is to pro- 
tect against replay attacks. In a replay attack, the attack- 
er could capture the packet containing the MAC, and 
send it after a period of time. To ensure that this does 
not happen, the recipient reads the MAC generation time 
and the current clock time, and verifies whether the MAC 
was generated within an “allowable expiry time,” which is 
computed using the “Fudge Factor”. 

The “Fudge factor” field specifies the duration of time af- 
ter the MAC generation time, the message can be consid- 
ered valid. It is computed by applying a “fudge factor” on 
the MAC generation time (adding or subtracting a small 
number of seconds) to allow for clock skew (mismatch) 
between the MAC generator and verifier hosts. 

To have a secure transaction based on TSIG, a send- 
er computes the hash of the entire DNS message and 
secret key, and encodes the result in a TSIG RR at the 
end of the message. At the recipient end, the TSIG re- 
cord is stripped from the DNS message and processed. 
The process whereby the recipient uses the TSIG re- 
cord to verify the integrity of the received DNS message 
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is called verification. The verification process uses the 
hash algorithm name to identify the hash function, and 
the key name to identify the key to be used to validate 
the TSIG record. The number of the fudge factor is used 
to add to, or subtract from the signing time, to allow for 
the possible mismatch of clocks of the signer and veri- 
fier. Thus, the fudge factor provides the tolerance limit 
for the MAC validity period computed, based on the time 
of generation. 

The purpose of sending the key name in the TSIG re- 
cord is to enable the verifier (recipient) of the DNS mes- 
sage to use the right key to verify it. It also enables the 
recipient to verify that the key name is indeed one of the 
keys shared with the sender. The purpose of the “Time 
Signed” or time stamp field in the TSIG record is to inform 
the message recipient about the time of MAC generation. 
The recipient compares this value with the current clock 
time at the recipient system to ensure that the MAC was 
generated within the allowable time specified as part of 
the TSIG record itself. The purpose of using a time stamp 
is to prevent replay attacks. For correct verification of the 
generation time against the current time, it is essential 
that the system clocks of the transaction participants to be 
synchronized. Protocols such as the Network Time Proto- 
col (NTP) are available for this purpose. 

The verification process consists of the recipient retriev- 
ing the appropriate secret key, generating its own hash 
of the received DNS message, and comparing it with the 
received hash (in the TSIG record). In this verification pro- 
cess, the receiving name server has performed the follow- 
ing validations: 


* The message has been verified as coming from an 
authenticated source (with whom it shares the secret 
key). 

* The message has not been altered in transit (verified 
by matching hash values). 


Source authentication counters identify spoofing, and 
data integrity checking helps to counter corruption and 
modification of data in transit. 

BIND version 8.2 was the first version to introduce TSIG 
features, and is present in every later version. Support for 
TSIG in BIND 9 and higher includes features to secure 
zone transfer and dynamic update transactions [2]. 

The following operations are needed to set up the envi- 
ronment for enabling DNS transactions to use TSIG: 


« The system clocks of the name servers (primary and 
secondary) participating in DNS transactions must be 
synchronized (e.g., through NTP). 
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e« There should be a secret key generation utility that 
can generate keys of the required length with suffi- 
cient entropy. The key file, (the file containing the se- 
cret key string), must be securely communicated to 
the two servers participating in the transaction. 

¢ The key information should be specified in the con- 
figuration file through appropriate statements (e.g., by 
the key statement and server statement in the named. 
conf configuration file of BIND 9 and higher). 


The key generation process is described in “Key Gen- 
eration.” The commands needed to define the keys and 
instruct the name server to use those keys for all DNS 
transactions are outlined in “Defining the Keys in the 
Communicating Name Servers” and “Instructing Name 
Servers to Use Keys in All Transactions.” The set of 
checklists for key file creation and key definition with- 
in the name servers is given in “Checklists for Key File 
Creation and Key Configuration Process.” Protection of 
zone transfer transactions and dynamic update transac- 
tions using HMAC, as specified in TSIG, are covered in 
“Securing Dynamic Updates Using TSIG or SIG(0)” and 
“Configuring Dynamic Update Forwarding Restrictions 
Using TSIG Keys.” 


Key Generation 

To enable zone transfer (requests and responses) through 
authenticated messages, it is necessary to generate a key 
for every pair of name servers. The key can also be used 
for securing other transactions, such as dynamic updates, 
DNS queries, and responses. 

The binary key string, that is generated by most key gen- 
eration utilities used with DNSSEC, is base64 encoded. 
The program that generates the key in BIND 9 and higher 
iS dnssec-keygen. An example of a command that gener- 
ates a secret key (as opposed to other types of keys, such 
as public keys, which this program can also generate) by 
invoking the anssec-keygen program is as follows: 


dnssec-keygen -a HMAC-SHA256 -b 112 -n HOST nsil-ns2. 


example.com. 


where the various command options (parameters) de- 
note the following: 


* -a option: the name of the hashing algorithm that 
will use the key (HMAC-SHA256 is preferred, but 
may not be available in older implementations. Use 
of HMAC-SHAY1 is allowed, but migration to HMAC- 
SHA256 should be done when available) 

* -b option: the length of the key (here — 112 bits) 
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* -n option: the type of key (in this case, the Host) 
¢ last parameter: the name of the key (nsi-ns2.example. 


com) 


The dnssec-keygen program generates the following files, 
each containing the key string: 


Knsl-ns2.example.com.+157+34567.key 
Knsl-ns2.example.com.+157.34567.private 

When the program is generating a pair of keys (one public 
and the other private), the file with the extension key will 
contain the public key string and the file with extension 
private will contain the private key. Because in this case 
only the secret key is being generated, the key strings in 
both files will be the same for the TSIG implementation. 
The key string from any of these files is then copied to a 
file called the key file. This file is then referenced using an 
include statement within the key statement. 


Defining the Keys in the Communicating Name 
Servers 

The key generated by using the dnssec-keygen utility has 
to be defined within the named. cont configuration file of the 
two communicating servers (generally one primary name 
server and one secondary name server). This is accom- 
plished by using the key statement of BIND: 


key “nsl-ns2.example.com.” { 

algorithm hmac-sha256; 

include “/var/named/keys/secretkey.conf”; 
}e 


where the file secretkey.conf will contain the keyword 
secret and the actual key string (in this example): 


secret “MhZQKc4TwAPkURM=="; 


Defining the Keys in an NSD Configuration File 
In NSD, declaring a TSIG key is very similar to the exam- 
ple above, with some minor syntax changes: 


key: 
name: nsl-ns2.example.com. 
algorithm: hmac-sha256 
secret: “MhZQKc4TwAPkURM==” 
Instructing Name Servers to Use Keys in All 
Transactions 
The command to instruct the server to use the key in all 
transactions (DNS query/response, zone transfer, dynam- 
ic update, etc.) is as follows: 
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server 192.249.249.1 { 
keys { nsl-ns2.example.com.; }; 
}F 


The same statement can be used as an entry in an acl 
statement as well: 


acl key acl { 
nsl-ns2.example.com.; 


}e 


Checklists for Key File Creation and Key 
Configuration Process 

In each of the configuration files of the pair of servers 
that share a secret key in a zone, the name of the key 
to be used for all communication between them must be 
specified (via the server statement in BIND configura- 
tion files). 


Checklist #8 

The TSIG key should be a minimum of 112 bits in length, 
if the generator utility has been proven to generate suf- 
ficiently random strings. The generated TSIG key may 
have to be longer to insure at least 112 bits of security. 


Checklist #9 

A unique TSIG key should be generated for each pair of 
communicating hosts (i.e., a separate key for each sec- 
ondary name server to authenticate transactions with the 
primary name server, etc.). 


Checklist #10 

After the key string is copied to the key file in the name 
server, the two files generated by the dnssec-keygen pro- 
gram should either be made accessible only to the server 
administrator account (e.g., root in UNIX) or, better still, 
deleted. The paper copy of these files also should be de- 
stroyed. 


Checklist #11 

The key file should be securely transmitted across the 
network to name servers that will be communicating with 
the name server that generated the key. 


Checklist #12 

The statement in the configuration file (usually found at / 
etc/named.conf for BIND running on UNIX) that describes 
a TSIG key (key name (ID), signing algorithm, and key 
string) should not directly contain the key string. When the 
key string is found in the configuration file, the risk of key 
compromise is increased in some environments where 
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there is a need to make the configuration file readable by 
people other than the zone administrator. Instead, the key 
string should be defined in a separate key file and refer- 
enced through an include directive in the key statement 
of the configuration file. Every TSIG key should have a 
separate key file. 


Checklist #13 

The key file should be owned by the account under 
which the name server software is run. The permission 
bits should be set so that the key file can be read or 
modified only by the account that runs the name server 
software. 


Checklist #14 

The TSIG key used to sign messages between a pair of 
servers should be specified in the server statement of 
both transacting servers to point to each other. This is 
necessary to ensure that both the request message and 
the transaction message of a particular transaction are 
signed and hence secured. 


Securing Zone Transfers using TSIG 

The pair of servers participating in zone transfer transac- 
tions must be instructed to use the key defined using the 
key Statement (see “Defining the Keys in the Communi- 
cating Name Servers”). This pair generally consists of a 
primary name server and a secondary name server. The 
primary name server is configured to accept zone trans- 
fer requests only from secondary name servers that send 
MACs using the named key along with a zone transfer re- 
quest message. 

The configuration is accomplished by using the allow- 
transfer substatement of the zone statement. A sam- 
ple allow-transfer Substatement that specifies that 
the primary name server should only allow zone trans- 
fer requests for the example.com zone from name 
servers that use the nsi-ns2.example.com key iS as 
follows: 


zone “example.com” { 

type master; 

file “zonedb.example.com”; 

allow-transfer { key {nsl-ns2.example.com.}; }; 
}; 


The secondary name server is instructed to use the key 
nsl-ns2.example.com in the zone transfer request to the 
primary name server (with IP address 192.249.249.1) us- 
ing the server statement shown in “Defining the Keys in 
an NSD Configuration File.” 
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In NSD, the syntax is similar, but there is a special re- 
quirement if no TSIG key is to be used. The zone option 
provide-xfer is used to indicate which IP addresses can 
request a zone transfer for this zone on this server: 


zone: 
type: master; 
file “zonedb.example.come”; 
provide-xfer: 192.68.0.1 nsl-ns2.example.com. 
provide-xfer: 192.68.0.1 NOKEY 


Securing Dynamic Updates Using TSIG or SIG(0) 
Dynamic update restrictions based on TSIG keys can be 
specified in BIND 8.2 and higher versions [3] by using the 
allow-update substatement of the zone statement. The ar- 
guments to this statement are the keyword key followed 
by the name of the TSIG key. (See “Defining the Keys in 
the Communicating Name Servers” for details on how to 
enter in the key statement in a BIND name server configu- 
ration file.) Once the key statement has been entered, the 
following substatement can be added to the zone state- 
ment to make use of the secret key for dynamic updates: 


zone “example.com” { 

type master; 

file “zonedb.example.com”; 

allow-update { key dhcp-server.example.com.; }; 
}; 


Note that although the string dncp-server.example.com. 
looks like a FQDN, it actually denotes the name of the 
TSIG key. The implication of the configuration statement 
example is that any hosts that possess the key named 
dhcp-server.example.com. can submit dynamic update 
requests (adding, deleting, or modifying RRs) to the zone 
file (for the zone example.com) that resides in the primary 
authoritative name server. 

To use SIG(O) to authenticate dynamic update mes- 
sages, the key used must first have its public component 
stored in the DNS, so a validating client can obtain it.[4] 
We'll discuss publishing public keys and setting up trust 
anchors in Part 5. The previous steps above need to be 
performed to control access (if desired). After that is done, 
the updating name server should be able to obtain the 
key and process the dynamic update request (if the name 
server supports SIG(O) with dynamic update). 


Configuring Dynamic Update Forwarding 
Restrictions Using TSIG Keys 

Dynamic updates are allowed on the copy of the zone 
file in the primary authoritative name server only be- 
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cause that is the only “writable” copy. This does not au- 
tomatically imply that the primary authoritative name 
server is the only one allowed to accept dynamic up- 
date requests. 

In fact, BIND 9.1 and higher versions allow second- 
ary name servers to accept dynamic update requests 
and forward them to the primary authoritative name 
server. 

In this scenario, if there are no restrictions on the basis of 
the identity of hosts from whom the secondary name server 
can forward such dynamic update requests, it is equivalent 
to circumventing the dynamic update restrictions specified 
in the primary name server because the request can literally 
originate from any host to the secondary name server and 
be forwarded to the primary name server. To counter this 
problem, a new substatement, allow-update-forwarding, 
is now available in BIND versions that have the dynam- 
ic update forwarding feature. An example of this aiiow- 
update-forwarding statement using TSIG keys is given 
below: 


zone “example.com” { 
type slave; 
file “backupdb.example.com”; 
allow-update-forwarding { key dhcp-server.example.com.; 
}; 
i 


Configuring Fine-Grained Dynamic 
Update Restrictions Using TSIG/SIG(0) Keys 
The allow-update Substatement specifies dynamic up- 
date restrictions based on the originators of dynamic 
update requests (a specific set of hosts identified by 
IP address or holding a TSIG key), but not the con- 
tents of the zone records. To specify dynamic update 
access (grant or deny) restrictions based on a com- 
bination of domain/subdomain names and RR types 
(A, MX, NS, etc.), BIND 9 and higher versions provide 
the update-policy substatement within the zone state- 
ment. The update-policy substatement bases these re- 
strictions on the TSIG key. In other words, the update- 
policy Statement specifies which TSIG keys (or holders 
of keys) are allowed to perform dynamic updates on 
which domains/subdomains and RR types within that 
domain/subdomain. 

The general form of the update-policy statement is as 
follows: 


update-policy { 
(grant | deny) TSIGkey nametype name [type] 
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where the semantics of each of the statement compo- 
nents are as follows: 


* grant/deny — allow/disallow dynamic update for the 
combination that follows 

¢« TSIGkey — the name of the TSIG key used to authen- 
ticate the update request 

* nametype — can be one of the following with the as- 
sociated semantics: 

* name — restriction applies to the domain name speci- 
fied in the following name field 

¢« subdomain — restriction applies to subdomains of the 
domain specified in the following name field 

¢ wildcard — restriction applies to the set of domains 
specified using the wildcard syntax (i.e., *) in the fol- 
lowing name field 

¢ self — restriction applies to the domain whose name 
is the same as that in the TSIG key field (i.e., the do- 
main name whose records are to be updated has the 
same name as the key used to authenticate the dy- 
namic update request). In this usage, the contents of 
the name field become redundant but still should be 
used in the statement (i.e., the name field cannot be 
left blank) 

* name — used to specify the name of the domain. The 
syntax used and the domains it covers are based on 
the value used in the nametype field (e.g., if subdo- 
main is the value of the nametype field, then all sub- 
domains of the domain name used are being covered 
under this statement). 

« type — an optional field that can contain any valid 
RRType (except the NSEC type) or the wildcard type 
‘ANY’ (ANY stands for all RR types except the NSEC 
type). If it is missing, it denotes all RR types, except 
SOA, NS, RRSIG, and NSEC. It also is possible to 
put in multiple RRTypes separated by a space (e.g., A 
NS). 


Examples of update-policy statements and their associ- 
ated semantics are given below. 

Suppose there is a domain sales.example.com with- 
iN example.com and that name server uses a TSIG 
key that has the same name as its own domain 
name (i.€., sales.example.com). All dynamic updates 
from sales.example.com Could be restricted to all re- 
source records of that domain within the zone file as 
follows: 


zone “example.com” { 


type master; 


file “zonedb.example.com”; 
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update-policy { grant sales.example.com. self sales. 
example.com.; }; 


}e 


All dynamic updates from sales.example.com Could be re- 
stricted to only A and MX RR types of that domain as fol- 
lows: 


zone “example.com” { 
type master; 
file “zonedb.example.com”; 
update-policy { grant sales.example.com. self sales. 
example.com. A MX; }; 
}; 


To allow clients with the TSIG key sales.example.com to 
update all records pertaining to subdomains of Nesales. 
example.com except the name server records (RR Type 
NS): 


zone “example.com” { 
type master; 
file “zonedb.example.com”; 
update-policy { 
deny sales.example.com. subdomain NE sales.example. 
com. NS; 
grant sales.example.com. subdomain NE sales.example. 
com. ANY; 


}; 


For Microsoft Windows, authentication is provided us- 
ing GSS-TSIG. System administrators of Windows Serv- 
ers should consult their implementation’s documentation 
on how to integrate secure dynamic update using GSS- 
TSIG. 


Summary 

Transaction Signature (or TSIG) is a protocol defined in 
RFC 2845. It’s used by DNS to provide a means of authen- 
ticating updates to a dynamic DNS database, although it 
can also be used between servers, and for regular que- 
ries. TSIG uses shared secret keys and one-way hashing 
to provide a cryptographically secure means of identifying 
each endpoint of a connection as being allowed to make, 
or respond to a DNS update. 

Although queries to DNS may be made anonymously, 
updates to DNS must be authenticated, since they make 
lasting changes to the structure of the Internet naming 
system. The use of a key shared by the client making the 
update and the DNS server, guarantees the authentic- 
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key. 


en-us/library/cc753751.aspx 


[5] http:/www.caida.org/outreach/papers/2003/dnsspectroscopy/ 
[6] http://public.as112.net/ 


[1] The term TSIG Key (while commonly used) is not technically correct, as it refers to a shared secret string and not a cryptographic 

[2] Some server software such as Microsoft Windows Server 2008 does not implement TSIG, but use lower level transaction security 
(such as IPSec). To set this up, see http://technet.microsoft.com/en-us/library/ee649243(WS.10).aspx 

[3] As mentioned earlier, Micosoft Server does not use TSIG for dynamic updates, but uses IPSec instead. See: http://technet.microsoft.com/ 


[4] For more information, see the note from the dynamic update howto: http://ops.ietf.org/dns/dynupd/secure-ddns-howto.html 


ity of the update request. However, the update request 
may be passing over an insecure channel (the Internet). 
A one-way hashing function is used to prevent malicious 
observers from learning the secret key and using it to 
make their own modifications. 

A time stamp is included in the TSIG protocol to prevent 
recorded responses from being reused, which would al- 
low an attacker to breach the security of TSIG. This plac- 
es a requirement on dynamic DNS servers and TSIG cli- 
ents to contain an accurate clock. Since DNS servers are 
connected to a network, the Network Time Protocol may 
be used to provide an accurate time source. 

Although TSIG is widely deployed, there are several 
problems with the protocol: 


* It requires distributing secret keys to each host which 
must make updates. 

* The HMAC-MD5 digest is only 128 bits. 

¢ There are no levels of authority. Any host with the se- 
cret key may update any record. 

* Asa result, a number of alternatives and extensions 
have been proposed. 

e RFC 2137 specifies an update method using a pub- 
lic key “SIG” DNS record. A client holding the corre- 
sponding private key can sign the update request. 
This method matches the DNSSEC method for se- 
cure queries. However, this method is deprecated by 
RFC 3007. 

« In 2003, RFC 3645 proposed extending TSIG to al- 
low the Generic Security Service (GSS) method of 
secure key exchange, eliminating the need for man- 
ually distributing keys to all TSIG clients. The meth- 
od for distributing public keys as a DNS resource re- 
cord (RR) is specified in RFC 2930, with GSS as 
one mode of this method. A modified GSS-TSIG — 
using the Windows Kerberos Server — was imple- 
mented by Microsoft Windows Active Directory serv- 
ers and clients called Secure Dynamic Update. In 
combination with poorly configured DNS (with no 
Reverse Lookup Zone) using RFC 1918 address- 
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ing, reverse DNS updates using this authentication 
scheme are forwarded en masse to the root DNS 
servers and increase the traffic to root DNS serv- 
ers in the course of doing so [5]. There is an anycast 
group which deals with this traffic to take it away 
from the root DNS servers [6]. 

¢« RFC 2845, which defines TSIG, specifies only one al- 
lowed hashing function HMAC-MD5, which is no lon- 
ger considered to be highly secure. In 2006, propos- 
als were circulated to allow RFC 3174 Secure Hash 
Algorithm (SHA1) hashing to replace MD5. The 160- 
bit digest generated by SHA1 should be more secure 
than the 128-bit digest generated by MD5. 

¢ RFC 2930, which defines TKEY, a DNS Record used 
to automatically distribute keys from a DNS server to 
DNS clients. 

« RFC 3645, which defines GSS-TSIG, uses gss-api 
and TKEY to automatically distribute keys in gss-api 
mode. 

« The DNSCurve proposal has many similarities to 
TSIG. 
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